开发者

Java string encoding conversion within a webpage

开发者 https://www.devze.com 2022-12-19 04:48 出处:网络
开发者_如何学PythonI have a webpage that is encoded (through its header) as WIN-1255. A Java program creates text string that are automatically embedded in the page. The problem is that the original
开发者_如何学Python

I have a webpage that is encoded (through its header) as WIN-1255. A Java program creates text string that are automatically embedded in the page. The problem is that the original strings are encoded in UTF-8, thus creating a Gibberish text field in the page.

Unfortunately, I can not change the page encoding - it's required by a customer propriety system.

Any ideas?

UPDATE:

The page I'm creating is an RSS feed that needs to be set to WIN-1255, showing information taken from another feed that is encoded in UTF-8.

SECOND UPDATE:

Thanks for all the responses. I've managed to convert th string, and yet, Gibberish. Problem was that XML encoding should be set in addition to the header encoding.

Adam


To the point, you need to set the encoding of the response writer. With only a response header you're basically only instructing the client application which encoding to use to interpret/display the page. This ain't going to work if the response itself is written with a different encoding.

The context where you have this problem is entirely unclear (please elaborate about it as well in future problems like this), so here are several solutions:

If it is JSP, you need to set the following in top of JSP to set the response encoding:

<%@ page pageEncoding="WIN-1255" %>

If it is Servlet, you need to set the following before any first flush to set the response encoding:

response.setCharacterEncoding("WIN-1255");

Both by the way automagically implicitly set the Content-Type response header with a charset parameter to instruct the client to use the same encoding to interpret/display the page. Also see this article for more information.

If it is a homegrown application which relies on the basic java.net and/or java.io API's, then you need to write the characters through an OutputStreamWriter which is constructed using the constructor taking 2 arguments wherein you can specify the encoding:

Writer writer = new OutputStreamWriter(someOutputStream, "WIN-1255");


Assuming you have control of the original (properly represented) strings, and simply need to output them in win-1255:

import java.nio.charset.*;
import java.nio.*;
Charset win1255 = Charset.forName("windows-1255");
ByteBuffer bb = win1255.encode(someString);
byte[] ba = new byte[bb.limit()];

Then, simply write the contents of ba at the appropriate place.

EDIT: What you do with ba depends on your environment. For instance, if you're using servlets, you might do:

ServletOutputStream os = ...
os.write(ba);

We also should not overlook the possible approach of calling setContentType("text/html; charset=windows-1255") (setContentType), then using getWriter normally. You did not make completely clear if windows-1255 was being set in a meta tag or in the HTTP response header.

You clarified that you have a UTF-8 file that you need to decode. If you're not already decoding the UTF-8 strings properly, this should no big deal. Just look at InputStreamReader(someInputStream, Charset.forName("utf-8"))


What's embedding the data in the page? Either it should read it as text (in UTF-8) and then write it out again in the web page's encoding (Win-1255) or you should change the Java program to create the files (or whatever) in Win-1255 to start with.

If you can give more details about how the system works (what's generating the web page? How does it interact with the Java program?) then it will make things a lot clearer.


The page I'm creating is an RSS feed that needs to be set to WIN-1255, showing information taken from another feed that is encoded in UTF-8.

In this case, use a parser to load the UTF-8 XML. This should correctly decode the data to UTF-16 character data (Java Strings are always UTF-16). Your output mechanism should encode from UTF-16 to Windows-1255.


byte[] originalUtf8;//Here input

//utf-8 to java String:
String internal = new String(originalUtf8,Charset.forName("utf-8");
//java string to w1255 String
byte[] win1255 = internal.getBytes(Charset.forName("cp1255"));

//Here output
0

精彩评论

暂无评论...
验证码 换一张
取 消