开发者

How HtmlCleaner handles Iframes in webpage

开发者 https://www.devze.com 2023-03-25 16:07 出处:网络
I want to understand ho开发者_StackOverfloww HtmlCleaner handles Iframes when cleaning raw html to produce valid xml output. One example of a page with iframes is this ebay product page.

I want to understand ho开发者_StackOverfloww HtmlCleaner handles Iframes when cleaning raw html to produce valid xml output. One example of a page with iframes is this ebay product page.

When I print the output of HtmlCleaner for this page, I find that some iframe tags are intact while others are missing. One of the missing iframes is the iframe with id="d". It contains the product description and its body has been merged into the main page.

The XML Output of html cleaner: http://pastebin.com/03f9gtdC

Could anyone kindly look at it, or suggest some better HTML parsing library which is able to handle iframes gracefully. That library should be able to support XPath evaluation.

0

精彩评论

暂无评论...
验证码 换一张
取 消