I had recently a problem with encoding of websites generated by servlet, that occurred if the servlets were deployed under Tomcat, but not under Jetty. I did a little bit of research about it and simplified the problem to the following servlet:
public class TestServlet extends HttpServlet implements Servlet {
@Override
public void service(HttpServletRequest request, HttpServletResponse response) throws IOException {
response.setContentType("text/plain");
Writer output = response.getWriter();
output.write("öäüÖÄÜß");
output.flush();开发者_高级运维
output.close();
}
}
If I deploy this under Jetty and direct the browser to it, it returns the expected result. The data is returned as ISO-8859-1 and if I take a look into the headers, then Jetty returns:
Content-Type: text/plain; charset=iso-8859-1
The browser detects the encoding from this header. If I deploy the same servlet in Tomcat, the browser shows up strange characters. But Tomcat also returns the data as ISO-8859-1, the difference is, that no header tells about it. So the browser has to guess the encoding, and that goes wrong.
My question is, is that behaviour of Tomcat correct or a bug? And if it is correct, how can I avoid this problem? Sure, I can always add response.setCharacterEncoding("UTF-8");
to the servlet, but that means I set a fixed encoding, that the browser might or might not understand. The problem is more relevant, if no browser but another service accesses the servlet. So how I should deal with the problem in the most flexible way?
If you don't specify an encoding, the Servlet specification requires ISO-8859-1. However, AFAIK it does not require the container to set the encoding in the content type, at least not if you set it to "text/plain". This is what the spec says:
Calls to setContentType set the character encoding only if the given content type string provides a value for the charset attribute.
In other words, only if you set the content type like this
response.setContentType("text/plain; charset=XXXX")
Tomcat is required to set the charset. I haven't tried whether this works though.
In general, I would recommend to always set the encoding to UTF-8 (as it causes the least amount of trouble, at least in browsers) and then, for text/plain, state the encoding explicitly, to prevent browsers from using a system default.
In support of Jesse Barnum's answer, the apache Wiki suggests that a filter can be used to control the character encoding of the request and the response. However, Tomcat 5.5 and up come bundled with a SetCharacterEncodingFilter so it may be better to use apache's implementation than to use Jesse's (no offense Jesse). The tomcat implementations only set the character encoding on the request, so modification may be necessary to use the filter as a means of setting the character set on the response of all servlets.
Specifically, Tomcat has implementations examples here:
5.x
webapps/servlets-examples/WEB-INF/classes/filters/SetCharacterEncodingFilter.java
webapps/jsp-examples/WEB-INF/classes/filters/SetCharacterEncodingFilter.java
6.x
webapps/examples/WEB-INF/classes/filters/SetCharacterEncodingFilter.java
7.x
Since 7.0.20 the filter became first-class citizen and was moved from the examples into core Tomcat and is available to any web application without the need to compile and bundle it separately. See documentation for the list of filters provided by Tomcat. The class name is: org.apache.catalina.filters.SetCharacterEncodingFilter
This page tells more: http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q3
Here's a filter that I wrote to force UTF-8 encoding:
public class CharacterEncodingFilter implements Filter {
private static final Logger log = Logger.getLogger( CharacterEncodingFilter.class.getName() );
boolean isConnectorConfigured = false;
public void init( FilterConfig filterConfig ) throws ServletException {}
public void doFilter( ServletRequest request, ServletResponse response, FilterChain chain ) throws IOException, ServletException {
request.setCharacterEncoding( "utf-8" );
response.setCharacterEncoding( "utf-8" );
if( ! isConnectorConfigured ) {
isConnectorConfigured = true;
try { //I need to do all of this with reflection, because I get NoClassDefErrors otherwise. --jsb
Field f = request.getClass().getDeclaredField( "request" ); //Tomcat wraps the real request in a facade, need to get it
f.setAccessible( true );
Object req = f.get( request );
Object connector = req.getClass().getMethod( "getConnector", new Class[0] ).invoke( req ); //Now get the connector
connector.getClass().getMethod( "setUseBodyEncodingForURI", new Class[] {boolean.class} ).invoke( connector, Boolean.TRUE );
} catch( NoSuchFieldException e ) {
log.log( Level.WARNING, "Servlet container does not seem to be Tomcat, cannot programatically alter character encoding. Do this in the Server.xml <Connector> attribute instead." );
} catch( Exception e ) {
log.log( Level.WARNING, "Could not setUseBodyEncodingForURI to true on connector" );
}
}
chain.doFilter( request, response );
}
public void destroy() {}
}
If you don't specify the encoding, Tomcat is free to encode your characters however it feels, and the browser is free to guess what encoding Tomcat picked. You are correct in that the way to solve the problem is response.setCharacterEncoding("UTF-8")
.
You shouldn't worry about the chance that the browser won't understand the encoding, as virtually all browsers released in the past 10 years support UTF-8. Though if you're really worried, you can inspect the "Accept-Encoding" headers provided by the user agent.
精彩评论