开发者

How to convert Java long's as Strings while keeping natural order

开发者 https://www.devze.com 2022-12-20 05:17 出处:网络
I\'m currently looking at a simple programming problem that might be fun to optimize - at least for anybody who believes that programming is art :) So here is it:

I'm currently looking at a simple programming problem that might be fun to optimize - at least for anybody who believes that programming is art :) So here is it:

How to best represent long's as Strings while keeping their natural order?

Additionally, the String representation should match ^[A-Za-z0-9]+$. (I'm not too strict here, but avoid using control characters or anything that might cause headaches with encodings, is illegal in XML, has line breaks, or similar characters that will certainly cause problems)

Here's a JUnit test case:

@Test
public void longConversion() {
    final long[] longs = { Long.MIN_VALUE, Long.MAX_VALUE, -5664572164553633853L,
            -8089688774612278460L, 7275969614015446693L, 6698053890185294393L,
            734107703014507538L, -350843201400906614L, -4760869192643699168L,
            -2113787362183747885L, -5933876587372268970L, -7214749093842310327L, };

    // keep it reproducible
    //Collections.shuffle(Arrays.asList(longs));

    final String[] strings = new String[longs.length];
    for (int i = 0; i < longs.length; i++) {
        strings[i] = Converter.convertLong(longs[i]);
    }

    // Note: Comparator is not an option
    Arrays.sort(longs);
    Arrays.sort(strings);

    final Pattern allowed = Pattern.compile("^[A-Za-z0-9]+$");
    for (int i = 0; i < longs.length; i++) {
        assertTrue("string: " + strings[i], allowed.matcher(strings[i]).matches());
        assertEquals("string: " + strings[i], longs[i], Converter.parseLong(strings[i]));
    }
}

and here are the methods开发者_Go百科 I'm looking for

public static class Converter {
    public static String convertLong(final long value) {
        // TODO
    }

    public static long parseLong(final String value) {
        // TODO
    }
}

I already have some ideas on how to approach this problem. Still, I though I might get some nice (creative) suggestions from the community.

Additionally, it would be nice if this conversion would be

  • as short as possible
  • easy to implement in other languages

EDIT: I'm quite glad to see that two very reputable programmers ran into the same problem as I did: using '-' for negative numbers can't work as the '-' doesn't reverse the order of sorting:

  1. -0001
  2. -0002
  3. 0000
  4. 0001
  5. 0002


Ok, take two:

class Converter {
  public static String convertLong(final long value) {
    return String.format("%016x", value - Long.MIN_VALUE);
  }

  public static long parseLong(final String value) {
    String first = value.substring(0, 8);
    String second = value.substring(8);
    long temp = (Long.parseLong(first, 16) << 32) | Long.parseLong(second, 16);
    return temp + Long.MIN_VALUE;
  }
}

This one takes a little explanation. Firstly, let me demonstrate that it is reversible and the resultant conversions should demonstrate the ordering:

for (long aLong : longs) {
  String out = Converter.convertLong(aLong);
  System.out.printf("%20d %16s %20d\n", aLong, out, Converter.parseLong(out));
}

Output:

-9223372036854775808 0000000000000000 -9223372036854775808
 9223372036854775807 ffffffffffffffff  9223372036854775807
-5664572164553633853 316365a0e7370fc3 -5664572164553633853
-8089688774612278460 0fbba6eba5c52344 -8089688774612278460
 7275969614015446693 e4f96fd06fed3ea5  7275969614015446693
 6698053890185294393 dcf444867aeaf239  6698053890185294393
  734107703014507538 8a301311010ec412   734107703014507538
 -350843201400906614 7b218df798a35c8a  -350843201400906614
-4760869192643699168 3dedfeb1865f1e20 -4760869192643699168
-2113787362183747885 62aa5197ea53e6d3 -2113787362183747885
-5933876587372268970 2da6a2aeccab3256 -5933876587372268970
-7214749093842310327 1be00fecadf52b49 -7214749093842310327

As you can see Long.MIN_VALUE and Long.MAX_VALUE (the first two rows) are correct and the other values basically fall in line.

What is this doing?

Assuming signed byte values you have:

  • -128 => 0x80
  • -1 => 0xFF
  • 0 => 0x00
  • 1 => 0x01
  • 127 => 0x7F

Now if you add 0x80 to those values you get:

  • -128 => 0x00
  • -1 => 0x7F
  • 0 => 0x80
  • 1 => 0x81
  • 127 => 0xFF

which is the correct order (with overflow).

Basically the above is doing that with 64 bit signed longs instead of 8 bit signed bytes.

The conversion back is a little more roundabout. You might think you can use:

return Long.parseLong(value, 16);

but you can't. Pass in 16 f's to that function (-1) and it will throw an exception. It seems to be treating that as an unsigned hex value, which long cannot accommodate. So instead I split it in half and parse each piece, combining them together, left-shifting the first half by 32 bits.


EDIT: Okay, so just adding the negative sign for negative numbers doesn't work... but you could convert the value into an effectively "unsigned" long such that Long.MIN_VALUE maps to "0000000000000000", and Long.MAX_VALUE maps to "FFFFFFFFFFFFFFFF". Harder to read, but will get the right results.

Basically you just need to add 2^63 to the value before turning it into hex - but that may be a slight pain to do in Java due to it not having unsigned longs... it may be easiest to do using BigInteger:

private static final BigInteger OFFSET = BigInteger.valueOf(Long.MIN_VALUE)
                                                   .negate();

public static String convertLong(long value) {
    BigInteger afterOffset = BigInteger.valueOf(value).add(OFFSET);
    return String.format("%016x", afterOffset);
}

public static long parseLong(String text) {
    BigInteger beforeOffset = new BigInteger(text, 16);
    return beforeOffset.subtract(OFFSET).longValue();
}

That wouldn't be terribly efficient, admittedly, but it works with all your test cases.


If you don't need a printable String, you can encode the long in four chars after you've shifted the value by Long.MIN_VALUE (-0x80000000) to emulate an unsigned long:

public static String convertLong(long value) {
    value += Long.MIN_VALUE;
    return "" + 
        (char)(value>>48) + (char)(value>>32) + 
        (char)(value>>16) + (char)value; 
}

public static long parseLong(String value) {
    return (
        (((long)value.charAt(0))<<48) + 
        (((long)value.charAt(1))<<32) + 
        (((long)value.charAt(2))<<16) + 
        (long)value.charAt(3)) + Long.MIN_VALUE;
}

Usage of surrogate pairs is not a problem, since the natural order of a string is defined by the UTF-16 values in its chars and not by the UCS-2 codepoint values.


There's a technique in RFC2550 -- an April 1st joke RFC about the Y10K problem with 4-digit dates -- that could be applied to this purpose. Essentially, each time the integer's string representation grows to require another digit, another letter or other (printable) character is prepended to retain desired sort-order. The negative rules are more arcane, yielding strings that are harder to read at a glance... but still easy enough to apply in code.

Nicely, for positive numbers, they're still readable.

See:

http://www.faqs.org/rfcs/rfc2550.html

0

精彩评论

暂无评论...
验证码 换一张
取 消