I'm in halfwa开发者_如何学Goy trough an html parser and found html5 defined explicitly the rules of thumb for parsing ill formed html. (And I used to infer them from DTDs, sigh)
I love that fact, but I know well that html5 isn't finalized yet (also I wonder if it ever will) and that it isn't developed by the W3C, but by the WHATWG.
Searching for the spec I need I'm presented with:
- 8.2 section of the W3C TR http://www.w3.org/TR/html5/syntax.html#parsing
or
- 11.2 section of the WHATWG web-apps/current-work http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html
If it wasn't for the section numbers I would induce those are simply the same. But the different numbering makes me wonder. Which version is, supposedly, the most authoritative?
WHATWG seems to have more sections, and to have been added to since W3C uploaded its candidate recommendation.
Will W3C update to the WHATWG version?
Or will they stick to their current candidate until it gets to the official recommendation status?Which html5 spec are we poor devils supposed to follow, when in doubt?
Always choose WHATWG over W3C, no exceptions.
Anne van Kesteren, (a WHATWG member who was a major contributor to the the HTML specification prior to the WHATWG and W3C versions diverging, and who remains a major contributor to the WHATWG specification) describes the current situation between WHATWG and W3C as follows on his blog:
The W3C has forked the [WHATWG] HTML Standard for the nth time. As always, it is pretty disastrous:
- Erased all Git history of the document.
- Did not document how they transformed the document. Issues of mismatches have already been reported and it will likely be a long time, if ever, before all bugs due to this process are uncovered, since it was not open.
- Did not discuss plans with the wider community.
- Did not discuss plans with the folks they were forking from.
- Did not even discuss plans with the members of the W3C Web Platform Working Group.
- Erased the acknowledgments section.
- Erased the copyright and licensing information and replaced it with their own.
2019: The war is finally over
On May 28th, 2019, W3C and the WHATWG have signed a agreement to collaborate on a single, authoritative version of the HTML and DOM specifications.
According to W3C's statement, the two parties have come to the following terms:
- W3C and WHATWG work together on HTML and DOM, in the WHATWG repositories, to produce a Living Standard and Recommendation/Review Draft-snapshots
- WHATWG maintains the HTML and DOM Living Standards
- W3C facilitates community work directly in the WHATWG repositories (bridging communities, developing use cases, filing issues, writing tests, mediating issue resolution)
- W3C stops independent publishing of a designated list of specifications related to HTML and DOM and instead will work to take WHATWG Review Drafts to W3C Recommendations
Biased answer from an editor of WHATWG HTML here. Hopefully the facts can speak for themselves though.
The WHATWG Living Standard should be considered authoritative. It is constantly worked on by a large community of contributors, including all browser vendors. No browser vendors implement according to W3C HTML; for some such as Firefox and Chrome this is a matter of publicly stated policy.
The WHATWG Living Standard is constantly receiving bug fixes and new features. For more information on this model of spec development, which more closely matches modern software development practices, see What does "Living Standard" mean?.
Unfortunately, the W3C sometimes copies and pastes our work onto their own website, and puts their own logo on it, and changes the names of the editors, and such. They do this for a variety of reasons, one of the largest of which is face-saving for the sake of their paying member companies (example of them stating this). What's worse, they like to release "versions" (like HTML "5.0", "5.1", etc.) which are just outdated versions missing modern bug fixes and features that clog up search result pages, causing confusion like this very question. We are currently tracking the confusion caused by these forks, of which HTML is only one.
You can track their progress on the copy-and-paste job in their issue tracker or in commits such as this one. It's a fun game to spot the bugs they introduce while doing this copy-and-paste job, as they generally do not read or understand the content they are copying, leading to widespread errors and inconsistencies.
It depends on who you ask. Really. The politics of this are ugly. And to make matters worse, the specifications aren't fully stable yet. I would have thought that the two specifications would be largely the same in their parsing sections since section 1.1.1 which lists the differences does not mention parsing. But then I did a web diff and I saw that there are subtle differences in the text. I would say that if you are actually implementing the specification to talk to the players involved about any differences you see between the specs, using the public mailing lists. Anyway, I am sorry I can't give you a clear cut answer.
OK , I eventually came to my own conclusion and I'm gonna share it.
I will follow the W3C version: blindly.
Politically speaking it's not a simple decision. Let me explain.
I was extremely sceptic about w3c, and I possibly even hated their guts during the whole XHTML debate/debacle. I saw the rise of WHATWG as the arrival of our pragmatical saviours: people that openly admitted that HTML can't be made into a stiff, rigorous XML-derived language, while the whole internet bothers nigh about it.
So given this point of view I should go with the WHATWG spec, shouldn't I?
No. Why?
WHATWG doesn't establish official versions. I kind of wish they did, but they don't.
They feel versions are too rigid for their...let's say hip attitude.
They instead have only a live standard.
(and track implementation status of any single feature by major browsers)
But I'm not a major browser, I'm a small implementer, I cannot refer to a live standard.
Well, not unless I go crazy over it and release constantly, like there's no tomorrow.
(that's sort of what is happening with firefox and chrome)
So over neverending frenetic madness, I have to choose sanity. And W3C offers polished and numbered versions of the spec. And I can claim to conform to one of those version.
When in doubt, try to match the behavior of actual browsers. That's all that actually matters.
In general, WHATWG is probably more current than W3C, though it may include more things that browsers don't support (yet).
You can think of W3C as taking snapshots of WHATWG at given points in time, stabilizing them, and then hardening them, never to be changed.
- W3C HTML5 was finalized 28 October 2014.
- W3C HTML5.1 was finalized 1 November 2016.
- W3C HTML5.2 is currently in its "Working Draft" and probably won't be finalized until 2019.
https://www.w3.org/html/ gives a clear answer to this old but still actual question:
https://html.spec.whatwg.org/multipage/ is the current HTML standard. It obsoletes all other previously-published HTML specifications.
As announced at https://www.w3.org/blog/2019/05/w3c-and-whatwg-to-work-together-to-advance-the-open-web-platform/, the W3C and the WHATWG signed an agreement to collaborate on the development of a single version of the HTML and DOM specifications:
https://html.spec.whatwg.org/multipage/ is the single version of HTML being actively developed https://dom.spec.whatwg.org/ is the single version of the DOM specification being actively developed. For further details about the W3C-WHATWG agreement, see the Memorandum of Understanding Between W3C and WHATWG.
The part "obsoletes all other previously-published HTML specifications" means that https://www.w3.org/TR/html52/ is considered obsolete.
P.S. The URL from the question, http://www.w3.org/TR/html5/syntax.html#parsing, redirects to https://html.spec.whatwg.org/multipage/parsing.html#parsing.
[Feb, 2023]
This issue seems to be closed definitively as WHATWG abandonment of W3C has forced them (W3C) to concede as per this Wikipedia entry:
In 2009, the W3C conceded and abandoned XHTML[24] and in 2019, ceded control of the HTML specification to the WHATWG.[25]
精彩评论