开发者

Dom and XPath scraping - What wrong here?

开发者 https://www.devze.com 2023-02-25 02:41 出处:网络
I need to scrape a length of text from a webpage from the internet, I am using the dom and xpath to find the data, however I cant seem to select the exact information I need. Here is my code so far, t

I need to scrape a length of text from a webpage from the internet, I am using the dom and xpath to find the data, however I cant seem to select the exact information I need. Here is my code so far, the problem is with the item(0)->nodeValue section - this works for my other scrapes i have for another page, however not this one.

$argos_html = file_get_html('http://www.argos.co.uk/static/Product/partNumber/9282197开发者_C百科/Trail/searchtext%3EIPOD+TOUCH.htm');

$dom_argos= new DOMDocument();
$dom_argos->loadHTML($argos_html);

$xpath_argos = new DOMXpath($dom_argos);

$expr_currys = "/html/body/div[4]/div[3]/form/div[2]/div/div[5]/ul/li[3]/span";
$nodes_argos = $xpath_argos->query($expr_argos);

$argos_stock_data = $nodes_argos->item(0)->nodeValue;

Could anyone show me where I am going wrong ? because I always get an error, which relates to the ->item(0)->nodeValue; part, however if I comment that out, theres no error, but theres no data collected at all...

Should it perhaps be just ->nodeValue;

I understand this may be down to page structures, but I am new to all of this! Thx


Running your code, I first get :

Notice: Undefined variable: expr_argos
Warning: DOMXPath::query() [domxpath.query]: Invalid expression

So, first of all, make sure you are using something valid for your XPath query -- for example, you should have this :

$nodes_argos = $xpath_argos->query($expr_currys);

instead of what you currently have :

$nodes_argos = $xpath_argos->query($expr_argos);


Then, you get the following error :

Notice: Trying to get property of non-object

on the following line :

$argos_stock_data = $nodes_argos->item(0)->nodeValue;

Basically, this means you are trying to read a property, nodeValue, on something that is not an object : $nodes_argos->item(0);

I'm guessing your XPath query is not valid ; so, the call to the xpath() method doesn't return anything interesting.

You should check your (quite a bit too long to be easy to understand) XPath query, making sure it matches something in your HTML page.


Your XPath is fine when I use it in Firefox, but it won't work with DOM, which is not surprising. I assume you got your XPath from some sort of browser plugin able to return the path for certain elements. However, you should not trust XPaths returned by browser plugins because browsers will modify the DOM through JavaScript and add implied values where necessary. Use the raw sourcecode instead.

Your XPath evaluates to "Home delivery within 2 days" in Firefox, which is not what I would expect in a variable called "stock_data". But anyway, this should do it:

$dom = new DOMDocument;
libxml_use_internal_errors(TRUE);
$dom->loadHTMLFile('http://www.argos.co.uk/static/Product/partNumber/9282197/Trail/searchtext%3EIPOD+TOUCH.htm');
libxml_clear_errors();

$xpath = new DOMXpath($dom);
$nodes = $xpath->query(
    '/html/body//div[@id="deliveryInformation"]/ul/li[@class="home"]/span'
);
echo $nodes->item(0)->nodeValue; // "Home delivery within 2 days"
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号