CherryPy doesn't properly handle non-ASCII characters in Jinja2 templates_问答_开发者

CherryPy doesn't properly handle non-ASCII characters in Jinja2 templates

开发者 https://www.devze.com 2023-02-09 11:11 出处：网络

I am trying to run a website using Python 2.7.1, Jinja 2.5.2, and CherryPy 3.1.2. The Jinja templates I am using are UTF-8 encoded. I noticed that some of the characters in those templates are being turned into question marks and other gibberish. If I try to render the templates directly without Jinja, I don't notice this problem. I discovered that I can fix it by calling .encode("utf-8") on the output of all my handlers, but that gets annoying since it clutters up my source. Does anyone know why this would happen or what to do about it? I made a small script to demonstrate this problem. The "char.txt" file is a 2-byte file consisting solely of a UTF-8 encoded "»" character.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import os, j开发者_开发问答inja2, cherrypy
jinja2env = jinja2.Environment(loader=jinja2.FileSystemLoader("."))

class Test(object):
    def test1(self):
        #doesn't work
        #curl "http://example.com/test1"
        #?
        return jinja2env.get_template("char.txt").render()
    test1.exposed = True

    def test2(self):
        #works
        #curl "http://example.com/test2"
        #»
        return open("char.txt").read()
    test2.exposed = True

    def test3(self):
        #works, but it is annoying to have to call this extra function all the time
        #curl "http://example.com/test3"
        #»
        return jinja2env.get_template("char.txt").render().encode("utf-8")
    test3.exposed = True

cherrypy.config["server.socket_port"] = 8500
cherrypy.quickstart(Test())

jinja2 works with Unicode only. It seems that cherrypy usually uses utf-8 as output encoding when the client sends no Accept-Header, but falls back to iso-8859-1 when it is empty.

tools.encode.encoding: If specified, the tool will error if the response cannot be encoded with it. Otherwise, the tool will use the 'Accept-Charset' request header to attempt to provide suitable encodings, usually attempting utf-8 if the client doesn't specify a charset, but following RFC 2616 and trying ISO-8859-1 if the client sent an empty 'Accept-Charset' header.

http://www.cherrypy.org/wiki/BuiltinTools#tools.encode

I could fix the problem by using the encode tool like this:

cherrypy.config["tools.encode.on"] = True
cherrypy.config["tools.encode.encoding"] = "utf-8"

Example

$ curl "http://127.0.0.1:8500/test1"
»
$ curl "http://127.0.0.1:8500/test2"
»
$ curl "http://127.0.0.1:8500/test3"
»

From the CherryPy tutorial:

tools.encode: automatically converts the response from the native Python Unicode string format to some suitable encoding (Latin-1 or UTF-8, for example).

That sounds like your answer.