I need to parse the following sample html using xpath query..
<td id="msgcontents">
<div class="user-data">Just s开发者_StackOverflow社区eeing if I can post a link... please ignore post
<a href="http://finance.yahoo.com">http://finance.yahoo.com</a>
</div>
</td>
<td id="msgcontents">
<div class="user-data">some text2...
<a href="http://abc.com">http://abc.com</a>
</div>
</td>
<td id="msgcontents">
<div class="user-data">some text3...
</div>
</td>
The above html may repeat n no of times in a page.
Also sometimes the ..... portion may be absent as shown in the above html blocks.
What I need is the xpath syntax so that I can get the parsed strings as
array1[0]= "Just seeing if I can post a link... please ignore post ttp://finance.yahoo.com"
array[1]="some text2 htp://abc.com"
array[2]="sometext3"
Maybe something like the following:
$remote = file_get_contents('http://www.sitename.com');
$dom = new DOMDocument();
//Error suppression unfortunately, as an invalid xhtml document throws up warnings.
$file = @$dom->loadHTML($remote);
$xpath = new DOMXpath($dom);
//Get all data with the user-data class.
$userdata = $xpath->query('//*[contains(@class, \'user-data\')]');
//get links
$links = $xpath->query('//a/@href');
So to access one of these variables, you need to use nodeValue
:
$ret = array();
foreach($userdata as $data) {
$ret[] = $data->nodeValue;
}
Edit: I thought I'd mention that this will get all the links on a given page, I assume this is what you wanted?
Use:
concat(/td/div/text[1], ' ', /td/div/a)
You can use instead of the ' ' above, whatever delimiter you'd like to appear between the two strings.
精彩评论