开发者

SimpleXML->xpath problem

开发者 https://www.devze.com 2023-03-04 20:50 出处:网络
I am trying to access each table row of: http://www.alliedelec.com/search/searchresults.aspx?N=0&Ntt=PIC16F648&Ntk=Primary&i=0&sw=n

I am trying to access each table row of:

http://www.alliedelec.com/search/searchresults.aspx?N=0&Ntt=PIC16F648&Ntk=Primary&i=0&sw=n

with Simp开发者_高级运维leXML->xpath. I have identified the xpath of the table to be:

'//*[@id="tblParts"]'

Now I take my cURL string $string and do the following:

$tidy->parseString($string);
$output = (string) $tidy;
$xml = new SimpleXMLElement($output);
$result = $xml->xpath('//*[@id="tblParts"]');
while(list( , $node) = each($result)) 
{
echo 'NODE:' . $node . "\n";
}

What I get back are errors such as these, by the hundreds:

Warning: SimpleXMLElement::__construct() [simplexmlelement.--construct]: Entity: line 60: parser error : Opening and ending tag mismatch: meta line 22 and head in C:\xampp\htdocs\elexess\api\driver\driver_alliedelectronics.php on line 119

Warning: SimpleXMLElement::__construct() [simplexmlelement.--construct]: </head> in C:\xampp\htdocs\elexess\api\driver\driver_alliedelectronics.php on line 119

Warning: SimpleXMLElement::__construct() [simplexmlelement.--construct]: ^ in C:\xampp\htdocs\elexess\api\driver\driver_alliedelectronics.php on line 119

Warning: SimpleXMLElement::__construct() [simplexmlelement.--construct]: Entity: line 108: parser error : Opening and ending tag mismatch: img line 106 and td in C:\xampp\htdocs\elexess\api\driver\driver_alliedelectronics.php on line 119

As well as this at the end:

Fatal error: Uncaught exception 'Exception' with message 'String could not be parsed as XML' in C:\xampp\htdocs\app\com\get\get_alliedelectronics.php:119 Stack trace: #0 C:\xampp\htdocs\app\com\get\get_alliedelectronics.php(119): SimpleXMLElement->__construct('<!DOCTYPE html ...') #1 C:\xampp\htdocs\app\com\get\get_alliedelectronics.php(95): get_Alliedelectronics->extractData('<!DOCTYPE html ...') #2 C:\xampp\htdocs\app\com\get\get_alliedelectronics.php(138): get_Alliedelectronics->query('PIC16F648') #3 {main} thrown in C:\xampp\htdocs\app\com\get\get_alliedelectronics.php on line 119


Looks like the HTML of the page you're fetching and trying to parse isn't well formed (tag mismatches etc.)

You can try and fix the errors using simplexml_import_dom as I explain in this SO post.


I'd suggest not using SimpleXML (@Nev Stokes and @Nicholas Wilson are right: this is html, not XML and you have no guarantees that it will validate as XML) and use something like DOM (see http://www.php.net/manual/en/book.dom.php). You can do something like:

$doc = new DOMDocument();
$doc->loadHTML($string);
$xpath = new DOMXPath($doc);
$entries = $xpath->query('//*[@id="tblParts"]');
foreach ($entries as $entry) {
  // do something
}

See if that helps.

0

精彩评论

暂无评论...
验证码 换一张
取 消