开发者

xpath query to parse html tags

开发者 https://www.devze.com 2023-01-17 16:56 出处:网络
I need to parse the following sample html using xpath query.. <td id=\"msgcontents\"> <div class=\"user-data\">Just s开发者_StackOverflow社区eeing if I can post a link... please ignore p

I need to parse the following sample html using xpath query..

<td id="msgcontents">
 <div class="user-data">Just s开发者_StackOverflow社区eeing if I can post a link... please ignore post
  <a href="http://finance.yahoo.com">http://finance.yahoo.com</a>
 </div>
</td>

<td id="msgcontents">
 <div class="user-data">some text2...
  <a href="http://abc.com">http://abc.com</a>
 </div>
</td>

<td id="msgcontents">
 <div class="user-data">some text3...      
 </div>
</td>

The above html may repeat n no of times in a page.

Also sometimes the ..... portion may be absent as shown in the above html blocks.

What I need is the xpath syntax so that I can get the parsed strings as

 array1[0]= "Just seeing if I can post a link... please ignore post ttp://finance.yahoo.com" 
 array[1]="some text2 htp://abc.com"
 array[2]="sometext3" 


Maybe something like the following:

   $remote = file_get_contents('http://www.sitename.com');
    $dom = new DOMDocument();
    //Error suppression unfortunately, as an invalid xhtml document throws up warnings.
    $file = @$dom->loadHTML($remote);

    $xpath = new DOMXpath($dom);

    //Get all data with the user-data class.
    $userdata = $xpath->query('//*[contains(@class, \'user-data\')]');

    //get links
    $links = $xpath->query('//a/@href');

So to access one of these variables, you need to use nodeValue:

$ret = array();
foreach($userdata as $data) {
  $ret[] = $data->nodeValue;
}

Edit: I thought I'd mention that this will get all the links on a given page, I assume this is what you wanted?


Use:

concat(/td/div/text[1], ' ', /td/div/a)

You can use instead of the ' ' above, whatever delimiter you'd like to appear between the two strings.

0

精彩评论

暂无评论...
验证码 换一张
取 消