I obviously can't determine whether a fragment of HTML is valid without knowing what the rest of the document looks like (at a minimum, I would need a doctype in order to know which rules I'm validating against). But given the following HTML5 fragment:
<article><header></article>My header</header><p>My text</p></article>
I can certainly determine that it is invalid without seeing the rest of the document. So, is there such a thing as "provisionally valid" HTML, or "valid providing it fits into a certain place in a valid document"?
Is there more to it than the following pseudocode?
def is_valid_fragment(fragment):
tmp = "<!doctype html><html><head><title></title></head><body>" +开发者_JS百科 fragment + "</body></html>"
return my_HTML5_validator.is_valid_html5_document(tmp)
You can certainly talk about an XML document weing well-formed, and you can construct a document from any single element and its children. You could thus talk about singly-rooted XHTML5 fragments being well-formed. You could deal with a multiply-rooted fragment (like <img/><img/>
) by dealing with it as a sequence of documents, or wrapping it in some synthetic container element - since we're only talking about well-formedness, that would be okay.
However, HTML5 still allows the SGML self-closing tags, like <hr>
and so on, whose self-closingness can only be determined by appeal to the doctype. For instance, <div><hr></div>
is okay, but <div><tr></div>
is not. If you were dealing with DOM nodes rather than text as input, this would be a nonissue, but if you have text, you'd need a parser which knows enough about HTML to be able to deal with those elements. Beyond that, though, some very simple rules, lifted directly from XML, would be enough to handle well-formedness.
If you wanted to go beyond well-formedness and look at some aspects of validity, i think you can still do that at the singly-rooted fragment level with XML. As the spec says:
An XML document is valid if it has an associated document type declaration and if the document complies with the constraints expressed in it.
A DTD can name any element as the root, and the mechanics then take care of checking the relationship between that element and its children, and their children and so on, and the various other constraints that make up validity.
Again, you can transfer that idea directly to HTML. I don't know how you deal with multiply-rooted fragments, though. And bear in mind that certain whole-document constraints (like IDs being unique) might hold inside the fragment, but not in an otherwise valid document once the fragment has been inserted into it.
Depending on what you intend to do with this verification, I think you should keep in mind that browsers are extremely forgiving regarding malformed HTML!
The invalid HTML string that you give in your example would work perfectly fine in (most if not all) browers:
const serializedHTML = "<article><header></article>My header</header><p>My text</p></article>"
const range = document.createRange()
const fragment = range.createContextualFragment(serializedHTML)
console.log(fragment)
The content of the fragment defined in the snippet above would result in the following DOM tree:
<article>
<header></header>
</article>
"My header"
<p>My text</p>
A crude method would be to check whether passing the fragment through the innerHTML of another element changes the text by doing something like the code below.
<html>
<head>
</head>
<script>
function validateHTML(htmlFragment) {
var testDiv = document.getElementById('testDiv')
testDiv.innerHTML = htmlFragment
var res = htmlFragment==testDiv.innerHTML
testDiv.innerHTML = ""
return res
}
</script>
<body>
<div id=testDiv style='display:none'></div>
<textarea id=txtElem onKeyUp="this.style.backgroundColor = validateHTML(this.value) ? '' : '#f00'"></textarea>
</body>
</html>
You could check if it is well-formed.
精彩评论