开发者

Can one prevent Genshi from parsing HTML entities?

开发者 https://www.devze.com 2022-12-08 21:22 出处:网络
I have the following Python code using Genshi (simplified): with open(pathToHTMLFile, \'r\') as f: template = MarkupTemplate(f.read())

I have the following Python code using Genshi (simplified):

with open(pathToHTMLFile, 'r') as f:
    template = MarkupTemplate(f.read())
finalPage = template.generate().render('html', doctype = 'html')

The source HTML file contains entities such as ©, ™ and ®. Genshi replaces these with their UTF-8 character, which causes problems with the viewer (the outp开发者_C百科ut is used as a stand-alone file, not a response to a web request) that eventually sees the resulting HTML. Is there any way to prevent Genshi from parsing these entities? The more common ones like & are passed through just fine.


Actually & isn't passed through, it's parsed into an ampersand character, and then serialised back to & on the way out because that's necessary to represent a literal ampersand in HTML. ©, on the other hand, is not a necessary escape, so it can be left as its literal character.

So no, there's no way to stop the entity reference being parsed. But you can ensure that non-ASCII characters are re-escaped on the way back out by serialising to plain ASCII:

template.generate().render('html', doctype= 'html', encoding= 'us-ascii')

You still won't get the entity reference © in your output, but you will get the character reference © which is equivalent and should hopefully be understood by whatever is displaying the final file.


Sticking

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

in the <head> of your HTML should cause browsers to correctly render UTF-8.

To clarify, the root issue is that the corresponding © UTF-8 character does not render correctly in static HTML. Placing the meta tag in the HTML tells the browser how to correctly interpret the character set and thus renders the UTF-8 characters correctly.


To prevent escaping of (x)html markup characters in Genshi:

from genshi.core import Markup
...
newstring = Markup(oldstring)
...
<now apply templates as before, but substituting newstring for oldstring>
0

精彩评论

暂无评论...
验证码 换一张
取 消