开发者

multiple string() results for an xpath?

开发者 https://www.devze.com 2023-04-08 04:35 出处:网络
string() works great on a certain webpage I am trying to extract text from. http://www.bing.com/sear开发者_如何转开发ch?q=lemons&first=111&FORM=PERE
string()

works great on a certain webpage I am trying to extract text from.

http://www.bing.com/sear开发者_如何转开发ch?q=lemons&first=111&FORM=PERE

has similar structure. For bing, the xpath I have tried is

string(//h3/a)

which works great to get the search results, even with strong tags etc, but only returns the first result. Is there something like strings(), so I can get the full text of each

//h3/a

result?


Is there something like strings(), so I can get the full text of each

//h3/a 

result?

No, Not in XPath 1.0.

From the W3C XPath 1.0 Specification (the only normative document about XPath 1.0):

"Function: string string(object?)

The string function converts an object to a string as follows:

A node-set is converted to a string by returning the string-value of the node in the node-set that is first in document order."

So, if you only have an XPath 1.0 engine available, you need to select the node-set of all //h3/a elements and then in your programming language that is hosting XPath, to iterate on each node and get its string value separately.

In XPath 2.0 use:

//h3/a/string()

The result of evaluating this XPath 2.0 expression is a sequence of strings, each of which is the string value of one of the//h3/a elements.


The MSDN documentation of string remarks that:

The string() function converts a node-set to a string by returning the string value of the first node in the node-set, which in some instances may yield unexpected results.

This sounds like what you are experiencing. Why are you using string() at all?

Use //h3/a/text()

0

精彩评论

暂无评论...
验证码 换一张
取 消