I want to understand ho开发者_StackOverfloww HtmlCleaner handles Iframes when cleaning raw html to produce valid xml output. One example of a page with iframes is this ebay product page.
When I print the output of HtmlCleaner for this page, I find that some iframe tags are intact while others are missing. One of the missing iframes is the iframe with id="d". It contains the product description and its body has been merged into the main page.
The XML Output of html cleaner: http://pastebin.com/03f9gtdC
Could anyone kindly look at it, or suggest some better HTML parsing library which is able to handle iframes gracefully. That library should be able to support XPath evaluation.
精彩评论