开发者

Parsing html using lxml and html5lib, getting "TypeError: insertDoctype() takes exactly 4 arguments (2 given)"

开发者 https://www.devze.com 2023-02-22 05:49 出处:网络
I\'m getting the error TypeError: insertDoctype() takes exactly 4 arguments (2 given) when using lxml and html5lib together. It seems that the insertDoctype method in lxml.html._html5builder.TreeBuild

I'm getting the error TypeError: insertDoctype() takes exactly 4 arguments (2 given) when using lxml and html5lib together. It seems that the insertDoctype method in lxml.html._html5builder.TreeBuilder (link) takes 4 args, while the html5lib code (link) calls it with 2 args. Am I somehow using this wrong?

These are the versions I'm using:

$ pip freeze
BeautifulSoup==3.2.0
distribute==0.6.14
html5lib==0.90
lxml==2.3
mechanize==0.2.4
wsgiref==0.1.2

My source code:

from lxml.html import html5parser

html5parser.document_fromstring('''<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://ww开发者_StackOverflow社区w.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html><head><title>t</title><body></body></html>''')

And the error:

Traceback (most recent call last):
  File "/tmp/t.py", line 4, in <module>
    <html><head><title>t</title><body></body></html>''')
  File "/Users/me/.virtualenvs/myenv/lib/python2.6/site-packages/lxml/html/html5parser.py", line 54, in document_fromstring
    return parser.parse(html, useChardet=guess_charset).getroot()
  File "/Users/me/.virtualenvs/myenv/lib/python2.6/site-packages/html5lib/html5parser.py", line 211, in parse
    parseMeta=parseMeta, useChardet=useChardet)
  File "/Users/me/.virtualenvs/myenv/lib/python2.6/site-packages/html5lib/html5parser.py", line 111, in _parse
    self.mainLoop()
  File "/Users/me/.virtualenvs/myenv/lib/python2.6/site-packages/html5lib/html5parser.py", line 189, in mainLoop
    self.phase.processDoctype(token)
  File "/Users/me/.virtualenvs/myenv/lib/python2.6/site-packages/html5lib/html5parser.py", line 482, in processDoctype
    self.tree.insertDoctype(token)
TypeError: insertDoctype() takes exactly 4 arguments (2 given)
0

精彩评论

暂无评论...
验证码 换一张
取 消