开发者

Is HtmlUnit 2.8 getFirstByXPath different from HtmlUnit 1.14 getFirstByXPath?

开发者 https://www.devze.com 2023-03-20 18:15 出处:网络
I have a site structure that looks something like this: <div class=\'main_container\'> <div class=\'item_container\'>

I have a site structure that looks something like this:

<div class='main_container'>
     <div class='item_container'>
         <div class='body'>
             <span class='item_name'>Item 1</span>
             <span class='item_desc'>Desc 1</span>
         </div>
     </div>
     <div class='item_container'>
         <div class='body'>
             <span class='item_name'>Item 2</span>
             <span class='item_desc'>Desc 2</span>
         </div>
     </div>
     ...
</div&g开发者_运维技巧t;<!--End of main_container--> 
//Note: Some divs might not have <span @class='item_name'>Item N</span> or other elements inside the item_container

In HtmlUnit 1.14 if I want to get all item name:

List<HtmlDivision> divs = (List<HtmlDivision>)page.getByXPath("//div[@class='item_container']");
for(HtmlDivision div:divs){
    String name = ((HtmlElement)div.getFirstByXPath("//span[@class='item_name']")).asText();
    System.out.println(name);
}

Output:

Item 1
Item 2
...

But in HtmlUnit 2.8 when I do the same I got.

Item 1
Item 1
...

Is there a workaround on this in HtmlUnit 2.8?


It may be that HtmlUnit 1.4 had a bug that you were exploiting/relying on.

In the code that you showed, the XPath inside of the for loop should return the same element each time it executes(as it does in v2.8), because it starts with //, which looks through the entire document starting at the root node and returns the first one that it finds.

If you want it to be relative from the <div> in the loop, you should adjust your XPath to: .//span[@class='item_name']

0

精彩评论

暂无评论...
验证码 换一张
取 消