开发者

libxml parse html - Unpaired Tags broke

开发者 https://www.devze.com 2023-01-23 15:46 出处:网络
I need parse html page in c++ with libxml. I face a problem, when using somefunction doc = htmlParseDoc( (xmlChar*)ptr, NULL ); console notice a problem seems like parser broke on

I need parse html page in c++ with libxml. I face a problem, when using some function doc = htmlParseDoc( (xmlChar*)ptr, NULL ); console notice a problem seems like parser broke on li>Now li - unpaired tag. Parser say number open tag li mismatch closed tags /li开发者_JAVA百科. Maybe some help what to do with this ? I presure for eny help and sorry for engish it's not my native language. I was try tidy but tidy trankate this part of html with message. I was try to parse with xpath interface but this not worked.


If you are open to using managed code (C#) on Windows, you could use the HTML Agility pack to work with this erroneous HTML input.

Otherwise, something like Tidy might work for you.

0

精彩评论

暂无评论...
验证码 换一张
取 消