Hey I have just started to use Python recently and I want to use it with a bit of xPath, the thing is when I print the result of the query I only get [] and I don't know why =S
import libxml2, urllib
doc = libxml2.parseDoc(urllib.urlopen("http://www.domain.com/").read())
result = doc.xpathEval("//th//td[(((count(preceding-sibling::*) + 1) = 2) and parent::*)]//a")
if result != []:
print result
elif result == "":
print "null"
else:
print result
doc.freeDoc()
I get no error whatsoever just a []. What could it be? also is there any better documentation for libxml2 than the one here since I find it reaaaally confusing =S
Edit
I changed the code, so now I get more than the [] I get the following output which should be related to the non-validity of the html I'm trying to parse (but it's not mine so I can't modify it). Any ideas on to how to tell Python to be more forgiving with that fact?
^ Entity: line 3552: parser error : Premature end of data in tag tr line 209
^ Entity: line 3552: parser error : Premature end of data in tag tbody line 208
^ Entity: line 3552: parser error : Premature end of data in tag table line 207
^ Entity: line 3552: parser error : Premature end of data in tag input line 206
^ Entity: line 3552: parser error : Premature end of data in tag input line 205
^ Entity: line 3552: parser error : Premature end of da开发者_运维问答ta in tag form line 204
^ Entity: line 3552: parser error : Premature end of data in tag table line 99
^ Entity: line 3552: parser error : Premature end of data in tag div line 98
^ Entity: line 3552: parser error : Premature end of data in tag body line 96
^ Entity: line 3552: parser error : Premature end of data in tag html line 3
^ Traceback (most recent call last): File "C:\Python26\lib\site-packages\libxml2.py", line 1263, in parseDoc if ret is None:raise parserError('xmlParseDoc() failed') libxml2.parserError: xmlParseDoc() failed
It's actually a longer list but there's no point in placing it all here, since all errors are due to invalid html.
It could be that your XPath doesn't select any elements. For example, you are looking for td's inside th's, but those elements are peers, and shouldn't nest.
Why do you say (count(preceding-sibling::*) + 1) = 2
instead of count(preceding-sibling::*) = 1
?
If you use a simpler XPath, do you get the results you expect?
Are you confusing th and tr? Change your th to tr.
Side note: Where does all that unnecessary complexity in your XPath come from? This:
//th//td[(((count(preceding-sibling::*) + 1) = 2) and parent::*)]//a
is equivalent to:
//th//td[count(preceding-sibling::*) = 1)]//a
and very probably even to:
//th/td[2]//a
精彩评论