I'm using Jinja2's nl2br filter, which looks like:
import re
from jinja2 import environmentfilter, Markup, escape
_paragraph_re = re.compile(r'(?:\r\n|\r|\n){2,}')
@evalcontextfilter
def nl2br(eval_ctx, value):
result = u'\n\n'.join(u'<p>%s</p>' % p.replace('\n', '<br>\n')
for p in _paragraph_re.split(escape(value)))
if eval_ctx.autoescape:
result = Mark开发者_运维技巧up(result)
return result
The problem is if "value" has anything but ascii characters (for example: "/mɒnˈtænə/" causes it to fail). I get this error:
Traceback (most recent call last):
File "/usr/local/lib/python2.6/dist-packages/Flask-0.6.1-py2.6.egg/flask/app.py", line 889, in __call__
return self.wsgi_app(environ, start_response)
File "/usr/local/lib/python2.6/dist-packages/Flask-0.6.1-py2.6.egg/flask/app.py", line 879, in wsgi_app
response = self.make_response(self.handle_exception(e))
File "/usr/local/lib/python2.6/dist-packages/Flask-0.6.1-py2.6.egg/flask/app.py", line 876, in wsgi_app
rv = self.dispatch_request()
File "/usr/local/lib/python2.6/dist-packages/Flask-0.6.1-py2.6.egg/flask/app.py", line 695, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/home/mcrittenden/Dropbox/Code/dropdo/dropdo.py", line 105, in view
return render_template(template, src = url, data = content)
File "/usr/local/lib/python2.6/dist-packages/Flask-0.6.1-py2.6.egg/flask/templating.py", line 85, in render_template
context, ctx.app)
File "/usr/local/lib/python2.6/dist-packages/Flask-0.6.1-py2.6.egg/flask/templating.py", line 69, in _render
rv = template.render(context)
File "/usr/local/lib/python2.6/dist-packages/Jinja2-2.5.5-py2.6.egg/jinja2/environment.py", line 891, in render
return self.environment.handle_exception(exc_info, True)
File "/home/mcrittenden/Dropbox/Code/dropdo/templates/text.html", line 1, in top-level template code
{% extends "layout.html" %}
File "/home/mcrittenden/Dropbox/Code/dropdo/templates/layout.html", line 25, in top-level template code
{% block content %}{% endblock %}
File "/home/mcrittenden/Dropbox/Code/dropdo/templates/text.html", line 8, in block "content"
{{ data|nl2br }}
File "/home/mcrittenden/Dropbox/Code/dropdo/dropdo.py", line 26, in nl2br
for p in _paragraph_re.split(escape(value)))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc9 in position 12: ordinal not in range(128)
What's the best to prevent the error but not remove the problem characters altogether?
Use unicode
literals everywhere.
"Unicode in Python, Completely Demystified"
If "value" has anything but ascii characters, you want it to be Unicode, and nothing but Unicode, throughout your entire app, except for a few places where you explicitly encode or decode it. Pass Unicode to your templates, too.
If you acquire the string "/mɒnˈtænə/" somehow, you probably know its encoding; use it:
value = "/mɒnˈtænə/".decode(the_encoding)
.
How do you learn the encoding? A HTTP request knows its encoding. An XML file knows its encoding. A plain text file usually does not; you must know its encoding by some other means.
Note that UTF-8 is not Unicode though it is an encoding that can fully represent Unicode. It's still an encoding, and to get a Python Unicode string from it, you need to .decode("utf-8")
it.
Try unidecode from http://pypi.python.org/pypi/Unidecode
>>> from unidecode import unidecode
>>> m=u'My fianc\xe9 David'; print m; print unidecode(m)
My fiancé David
My fiance David
>>>
精彩评论