开发者

Xpath - How to get the data contained between elements, not the elements themselves

开发者 https://www.devze.com 2023-01-08 22:52 出处:网络
I\'m writing a Java program that scrapes a web page for links and then stores them in a database. I\'m having problems though. Using HTMLUnit, I wrote the following:

I'm writing a Java program that scrapes a web page for links and then stores them in a database. I'm having problems though. Using HTMLUnit, I wrote the following:

page.getByXPath("//a[starts-with(@href, \"showdetails.aspx\")]");

It returns the correct anchor elements, but I only want the actual path contained in the href attribut开发者_开发技巧e, not the entire thing. How can I do this, and further, how can I get the data contained between nodes:

<a href="">I need this data, too.</a>

Thanks in advance!


The first (getting the href)

page.getByXPath("//a[starts-with(@href, \"showdetails.aspx\")]/@href");

The second (getting the text)

page.getByXPath("//a[starts-with(@href, \"showdetails.aspx\")]/text()");


I assume that getByXPath is a utility function written by you which uses XPath.evaluate? To get the string value you could use either xpath.evaluate(expression, object) or xpath.evaluate(expression, object, XMLConstants.STRING).

Alternatively you could call getNodeValue() on the attribute node returned by evaluating "//a[starts-with(@href, \"showdetails.aspx\")]/@href".

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号