开发者

bencoding binary data in Java strings

开发者 https://www.devze.com 2022-12-10 21:18 出处:网络
I\'m playing with bencoding and I would like to keep bencoded strings as Java strings, but they contain binary data, so blindly converting them to string will corrupt the data. What I am trying to acc

I'm playing with bencoding and I would like to keep bencoded strings as Java strings, but they contain binary data, so blindly converting them to string will corrupt the data. What I am trying to accomplish is to have a conversion function that will keep the ASCII bytes as ASCII and encode non-ASCII chars in a reversible way.

I have found some examples of what I am trying to accomplish in Python but I don't know enough Python to dig through them. This decoder does exactly what I would like to do: ASCII parts of the torrent stay as ASCII, but sha1 hashes are printed as "\xd8r\xe7". Though my Python knowledge is very limited, he doesn't seem to be doing anything special to the string; is this handled by the Python interpreter? Can I accomplish the same in Java?

开发者_开发百科

I have played with some encodings such as Base64 or using Integer.toHexString, but I get unreadable ASCII strings in the end.

I have also found a scheme example that prints everything but the sha1 hashes.


Bencoded strings are byte strings. You can attempt to decode a byte string to unicode codepoints in Java with String(byte[] bytes, Charset charset). Decoding with certain encodings such as ISO-8859-1 will always succeed, since any byte maps directly to a codepoint. With many of these encodings (including ISO-8859-1) the process is also reversible.


If Wikipedia is accurate on Bencode, the format seems straightforward enough. Parse the byte data directly:

while (true) {
  in.mark(1);
  int n = in.read();
  if (n < 0) {
    // end of input
    break;
  }
  in.reset();
  // take advantage of some UTF-16 values == ASCII values
  if (n == 'd') {
    // parse dictionary
  } else if (n == 'i') {
    // parse int
  } else if (n >= '0' && n <= '9') {
    // parse binary string
  } else if (n == 'l') {
    // parse list
  } else {
    throw new IOException("Invalid input");
  }

Store the binary strings in a type that only converts them to ASCII when you do it explicitly, as in this toString call:

public class ByteString {
  private final byte[] data;

  public ByteString(byte[] data) { this.data = data.clone(); }
  public byte[] getData() { return data.clone(); }

  @Override public String toString() {
    return new String(data, Charset.forName("US-ASCII"));
  }
}
0

精彩评论

暂无评论...
验证码 换一张
取 消