开发者

Why is this Xpath Query not working on the DOM of facebook application pages?

开发者 https://www.devze.com 2023-02-14 10:05 出处:网络
I dont understand why my xpath query returns the correct href f开发者_开发知识库or the second url but not the first url. The HTML code looks the same. It contains the same kind of structure. But someh

I dont understand why my xpath query returns the correct href f开发者_开发知识库or the second url but not the first url. The HTML code looks the same. It contains the same kind of structure. But somehow no href is returned. (I just comment out each one of the $url's to test it)

$url = "http://apps.facebook.com/TexasHoldEmPoker/"; // this one does not work
//$url = "http://nu.nl"; // this one works

$response = wp_remote_get($url);
$data = $response['body'];
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->strictErrorChecking = false;
$href='';
if (!$dom->loadHTML($data))
{
    foreach (libxml_get_errors() as $error)
    {
    }
    libxml_clear_errors();
}
else
{
    $xpath = new DOMXPath($dom);
    $elements = $xpath->query("/html/head/link[@rel='shortcut icon']");

    if (!is_null($elements))
    {
        foreach ($elements as $element)
        {
            if ($element->getAttribute('href'))
            {
                $href = $element->getAttribute('href');
            }
        }
    }
}
echo $href;

So I know the code is working correct for "nu.nl" but somehow not for the facebook apps pages. I cant grasp why since the structure is the same.

p.s. : full code here: http://plugins.svn.wordpress.org/wp-favicons/trunk/plugins/sources/page.php


Take a look at $dom->saveXML() .

You'll see that the <link>-element is a child of body, not of head like expected.

So the xpath should be:

/html/body/link[@rel='shortcut icon']

or

//link[@rel='shortcut icon']

I guess the different markup is a result of the parser when trying to fix the illegal <noscript> inside the <head>(everything inside the head after and including this <noscript> has been moved to the <body>)

0

精彩评论

暂无评论...
验证码 换一张
取 消