开发者

Convert byte-stream to character-stream in Java

开发者 https://www.devze.com 2023-02-05 16:39 出处:网络
Is there a class where one can create it by specifying the encoding, feed byte streams into it and get character streams from it? The main point is I want to conserve memory by not having both entire

Is there a class where one can create it by specifying the encoding, feed byte streams into it and get character streams from it? The main point is I want to conserve memory by not having both entire byte-stream data and entire character-stream data in the memory at the same time.

Something like:

Something s = new Something("utf-8");
s.write(buffer, 0, buffer.length); // it converts the bytes directly to characters internally, so we don't store both
// ..开发者_开发技巧. several more s.write() calls
s.close(); // or not needed

String text = s.getString();
// or
char[] text = s.getCharArray();

What is that Something?


Are you looking for ByteArrayInputStream? You could then wrap that in a InputStreamReader and read characters out of the original byte array.

A ByteArrayInputStream lets you "stream" from a byte array. If you wrap that in an InputStreamReader you can read characters. The InputStreamReader lets you stipulate the character encoding.

If you want to go directly from an input source of bytes, then you can just construct the appropriate sort of InputStream class (FileInputStream for example) and then wrap that in an InputStreamReader.


You can probably mock it up using CharsetDecoder. Something along the lines of

    CharsetDecoder decoder = Charset.forName(encoding).newDecoder();
    CharBuffer cb = CharBuffer.allocate(100);
    decoder.decode(ByteBuffer.wrap(buffer1), cb, false);
    decoder.decode(ByteBuffer.wrap(buffer2), cb, false);
    ...
    decoder.decode(ByteBuffer.wrap(bufferN), cb, true);
    cb.position(0);
    return cb.toString();

(Yes, I know this will overflow your CharBuffer -- you may want to copy the contents into a StringBuilder as you go.)


Your example code didn't seem to indicate that a character stream was needed. If so, String can already handle all that you want. Assuming String s contains the data,

char[] chars = s.toCharArray();
byte[] bytes = s.getBytes("utf-8");

The question then reduces to how to get bytes from a byte stream into String, for which you can use ByteArrayOutputStream, like so:

ByteArrayOutputSteam os = new ByteArrayOutputSteam();
os.write(buffer, 0, buffer.length); // it just stores the bytes, doesn't convert yet.
// several more os.write() calls
s = os.toString("utf-8"); // now it converts the full buffer to a string in the specified encoding.

If you truly want something that has a byte input stream and a character output stream, there isn't a built-in one.


Actually the title "Convert byte-stream to character-stream in Java" contradicts your example using no streams at all but arrays. I'm assuming further you want arrays.

You surely can't start with byte[] and end with char[] (or String) without having both somewhere for a while. There are however some possibilities:

  • in case you really need a char[]: Idea: Write the byte[] into a file and read it using a FileReader into the array. This doesn't really work, since you don't know the proper array length in advance. So generate and write all the characters into a file using DataOutput, read all of them back using DataInput into an array.

  • in case you really need a String: Create a char[] as above and use reflection and setAccessibe(true) to invoke the package-private ctor String(int offset, int count, char value[]).

  • in case a CharSequence suffices: Create a class MyCharSequence holding the byte[]. An extremely slow solution would be to implement its method charAt(index) by converting a part of the byte[] starting from the beginning until you obtain index+1 chars. Discard all of them on the fly and keep the last one. Such a stupid method is needed since using utf8 you don't know how many bytes corresponds with a single char. You could do it once at the beginning and remember for each char the position of its first byte. This is even more stupid, as you'd need much more memory for those positions. Fortunately, a simple space-time tradeoff exists, e.g., remember the position of the first byte for each 16th char.

All my proposals are a bit strange, but I believe, it can't be done much better. It could be a funny homework, I wouldn't go for it.

0

精彩评论

暂无评论...
验证码 换一张
取 消