开发者

Lax HTML parsing in C++?

开发者 https://www.devze.com 2023-02-06 05:33 出处:网络
I\'m looking for a solution for 开发者_如何学Cparsing potentially malformed HTML in C++, similar to what Beautiful Soup does in Python.

I'm looking for a solution for 开发者_如何学Cparsing potentially malformed HTML in C++, similar to what Beautiful Soup does in Python.

Normally, just using an XML parser would work, but the specific HTML in this case isn't valid XML/XHTML and can't be properly parsed.

Do libraries/tools for this exist?


You can use HTMLTidy to transform HTML into valid XML and then use any C++ XML parser availiable


According to the documentation LibXml2 is capable of parsing HTML4.


I've used Xerces and recommend it for C++. It has both DOM and SAX model.

0

精彩评论

暂无评论...
验证码 换一张
取 消