开发者

Pyquery invalidates html code

开发者 https://www.devze.com 2023-02-11 03:07 出处:网络
I was using pyquery to construct a webpage: > page = PyQuery(\'<html><head><script type=\"text/javascript\" src=\"jquery-1.4.min.js\"></script><script type=\"text/javasc

I was using pyquery to construct a webpage:

> page = PyQuery('<html><head><script type="text/javascript" src="jquery-1.4.min.js"></script><script type="text/javascript" src="tools.min.js"></script></head><body></body></html>')
> print page
Output: <html><head><script type="text/javascript" src="jquery-1.4.min.js"/><script type="text/javascript" src="tools.min.js"/></head><body/></html>

The script (and body) tags aren't supposed end like that though. Firefox ignores the rest of the header.

I tried breaking the above up into single elements (ie adding one script tag at a time), but to no avail:

> page = PyQuery('<html><head></head></html>')
> page.find('head').append('<script type="text/javascript" src="jquery-1.4.min.js"/></script>')
> page.find('head').append('<script type="text/javascript" src="tools.min.js"></script>')
Output: <html><head><script type="text/javascript" src="jquery-1.4.min.js"/><script type="text/javascript" src="tools.min.js"/></head><body/></html>

The same thing happens with <iframe/> tags (forced to use these due to youtube), they don't get closed by firefox and all proceeding code is ignored.

How can I force pyquery to close these using a separate close tag, as I believe, is according to html standards.

Oh and if anyone's wondering, I'm not doing it all in beautifulsoup because (1) I get beautifulsoup errors and (2) it's a deprecated packa开发者_如何学JAVAge, the author stopped supporting it about a year or two ago.


Try:

page = PyQuery('<html><head><script type="text/javascript" src="jquery-1.4.min.js">\n</script><script type="text/javascript" src="tools.min.js">\n</script></head><body></body></html>')

It also works with iframe.


You should use print page.__html__() to dump a html or, better, print page.html(method='html')

0

精彩评论

暂无评论...
验证码 换一张
取 消