I need parse html page in c++ with libxml. I face a problem, when using some function doc = htmlParseDoc( (xmlChar*)ptr, NULL ); console notice a problem seems like parser broke on li>Now li - unpaired tag. Parser say number open tag li mismatch closed tags /li开发者_JAVA百科. Maybe some help what to do with this ? I presure for eny help and sorry for engish it's not my native language. I was try tidy but tidy trankate this part of html with message. I was try to parse with xpath interface but this not worked.
If you are open to using managed code (C#) on Windows, you could use the HTML Agility pack to work with this erroneous HTML input.
Otherwise, something like Tidy might work for you.
精彩评论