开发者

Valid content-type for XML, HTML and XHTML documents

开发者 https://www.devze.com 2023-01-01 21:31 出处:网络
What are the correct content-types for XML, HTML and XHTML documents? I need to write a simple crawler that only f开发者_开发问答etches these kinds of files.

What are the correct content-types for XML, HTML and XHTML documents?

I need to write a simple crawler that only f开发者_开发问答etches these kinds of files.

Nowadays http://example.net/index.html can serve for example a JPEG file due to mod_rewrite, so I need to check the content-type from response header and compare it with a list of allowed content-types.

Where can I get such a list from?


HTML: text/html, full-stop.

XHTML: application/xhtml+xml, or only if following HTML compatbility guidelines, text/html. See the W3 Media Types Note.

XML: text/xml, application/xml (RFC 2376).

There are also many other media types based around XML, for example application/rss+xml or image/svg+xml. It's a safe bet that any unrecognised but registered ending in +xml is XML-based. See the IANA list for registered media types ending in +xml.

(For unregistered x- types, all bets are off, but you'd hope +xml would be respected.)

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号