开发者

Matching text with xpath?

开发者 https://www.devze.com 2023-03-27 17:30 出处:网络
I\'m screen-scraping an HTML page which contains: <table border=1 class=\"searchresult\" cellpadding=2>

I'm screen-scraping an HTML page which contains:

<table border=1 class="searchresult" cellpadding=2> 
<tr><th colspan=2>Last search</th></tr> 
<tr><th align=left>Search term</th><td>xxxxxx</td></tr> 
<tr><th align=left>Result</th><td>yyyyyyyy/td></tr> 
</table>

I want to write an XPATH expression which gets me the data cell containing "yyyyyyyy". I've gotten as far as

.//table[@class='searchresult']//tr/th

which gets me a list of all the table-header nodes in the table. I can iterate over them in user code, find the one whose .text is "Results" and then call .getnext() on that to get the table-data. But, is there a cleaner way to do this by writing a more specific XPATH pattern? It seems like there should be, but开发者_运维知识库 I haven't gotten my head that far around XPATH yet to figure out how.

If it matters, I'm doing this in Python with lxml.


.//table[@class='searchresult']//tr/td[preceding-sibling::th] might give you what you need.

Two comprehensive papers on semi-automatically creating XPath statements like this one, specifically for screen scraping purposes can be found here:

http://tobiasanton.com/Tobias_Anton/Academia.html


Use:

//table/tr[last()]/td

This selects any td element that is a child of any tr that is the last tr child of any table in this XHTML document.

This may select more than one td element, depending on whether or not there is only one table in the XHTML document. You need to make this expression more precise, if more than one table element is present.

For example, if the table in question is the first in the document, use:

(//table)[1]/tr[last()]/td
0

精彩评论

暂无评论...
验证码 换一张
取 消