Suppose a browser encounters a <meta>
tag that specifies the character-encoding, like this:
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
Does it star开发者_如何学Ct over from the beginning parsing the page again, since some of the preceding characters in the <head>
section may have been interpreted incorrectly? Or are there some other constraints that prevent prior characters from being interpreted incorrectly?
As far as I know, browsers wont go back after finding a charset declaration in the <head>
and they assume a ASCII compatible charset up to that point. Unfortunately I can't find a reference to confirm this.
Confirming browsers will ignore a Content-Type meta element, if the server already provides a Content-Type HTTP header, so you can't override a "wrong" server-side charset with a <meta>
element.
The point for the <meta>
charset declaration is for HTML documents that are not server by a HTTP server.
That means you shouldn't rely on a <meta>
charset declaration in the HTML file, but configure your HTTP server to provide the correct charset. If for some reason you have to rely on a <meta>
charset declaration, you should only have ASCII characters up to that point and position it as early in the <head>
as possible, preferably as the first element.
The parser can start over in some circumstances. The relevant spec is here: http://dev.w3.org/html5/spec/parsing.html#change-the-encoding
Note that browsers traditionally have probably not followed this algorithm exactly; chances are they've all done slightly different things. However, the link above describes what HTML5 compliant browsers should do. The algorithm described is likely an amalgam of various browsers previous behaviour.
Since HTML5 is still a working draft, this should be considered subject to change.
It has no real effect on the node structure. Only the content of text nodes (and attribute nodes) has to be transcoded.
If your server sends the
Content-type: text/html;charset=utf-8
...header the browser will know the right charset from the start. You can acieve ths with a .htaccess file containing:
AddDefaultCharset utf-8
精彩评论