开发者

Querypath and Malformed HTML

开发者 https://www.devze.com 2023-01-20 14:14 出处:网络
I\'m using QueryPath to manipulate a pages DOM. The page I\'m manipulating has some tags that QueryPath doesn\'t know how to interpret.

I'm using QueryPath to manipulate a pages DOM. The page I'm manipulating has some tags that QueryPath doesn't know how to interpret.

I've tried passing the following as options but I still get errors:

ignore_parser_warnings

use_parser (html)

I get the following errors with these enabled:

Warning: 开发者_JAVA技巧DOMDocument::loadHTML() [domdocument.loadhtml]: Tag nobr invalid in Entity

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity

Any help would be greatly appreciated.


Use htmlqp() instead of qp(). The htmlqp() function does a substantial amount of fixing for yucky HTML.


Try the libxml functions

libxml_use_internal_errors(TRUE);
$dom->load('whatever'); // or whatever you use for loading the DOM
libxml_clear_errors();

Instead of just clearing the erros, you can opt to handle them, though the above should be sufficient for most cases.


Just use an @ in front of your QueryPath functions to suppress the warnings. While invalid HTML may generate warnings, it can generally handle it just fine.

0

精彩评论

暂无评论...
验证码 换一张
取 消