开发者

Differentiating between XHTML and HTML with PHP DOMDocument

开发者 https://www.devze.com 2023-02-02 10:39 出处:网络
I want to manipulate HTML and XHTML documents with the PHP DOM implementation. I use the DOMDocument->loadHTML() method to load the content.

I want to manipulate HTML and XHTML documents with the PHP DOM implementation. I use the DOMDocument->loadHTML() method to load the content.

In want to know if the loaded content is either XHTML or HTML. DOMDocument has a docty开发者_如何学Cpe object which contains the DOCTYPE declaration from the document itself. So far I thought about comparing $dom->doctype->publicId which contains strings like "-//W3C//DTD HTML 4.01//ENtext/html"

Is there any better way anyone can think of?

Edit:

Sorry if my question was a bit unclear. I updated the question since it might have been confusing. But to make it clear now: This question is not about handling HTML with PHP DOM in general or whether XHTML is good or bad.


If you're loading from an external source, you can check the file's MIME type and see if it's application/xhtml+xml; if it is, it's most definitely XHTML (of course it can lie and serve with that type, but with horribly malformed markup). Otherwise if it's text/html then it'll be parsed as HTML tag soup. Validity of the actual markup aside, the doctype declaration is your next best way of telling whether the content is (or claims to be) HTML or XHTML.

Like you say, you can check the public identifier and/or the URI and determine the type from there.

0

精彩评论

暂无评论...
验证码 换一张
取 消