开发者

PHP HTML DOM Parser Amazon offer listing pull all prices and seller names

开发者 https://www.devze.com 2023-04-13 06:16 出处:网络
I am trying to pull the price and seller from the amazon offer listing pages found at: http://www.amazon.com/gp/offer-listing/B002UYSHMM

I am trying to pull the price and seller from the amazon offer listing pages found at:

http://www.amazon.com/gp/offer-listing/B002UYSHMM

I can get the price by using:

$ret['Retail'] = $html->find('span[class="price"]', 0)->innertext;

This pulls the first price in the offer listing

I tried to pull the matching seller of the first price by using the below to get the alt value from the img which contains the seller name:

$ret['SoldBy'] = $html->find('ul.sellerInformation img', 0)->getAttribute('alt');

It worked for the first one but as I went down it started missing sellers and even missing prices in some cases.

Can anyone tell why it would miss sellers and even jump around on prices? All I did to get additional sellers is:

$ret['Retail2'] = $html->find('span[class="price"]', 1)->innertext;
$ret['SoldBy2开发者_Python百科'] = $html->find('ul.sellerInformation img', 1)->getAttribute('alt');
$ret['Retail3'] = $html->find('span[class="price"]', 2)->innertext;
$ret['SoldBy3'] = $html->find('ul.sellerInformation img', 2)->getAttribute('alt');
$ret['Retail4'] = $html->find('span[class="price"]', 3)->innertext;
$ret['SoldBy4'] = $html->find('ul.sellerInformation img', 3)->getAttribute('alt');
$ret['Retail5'] = $html->find('span[class="price"]', 4)->innertext;
$ret['SoldBy5'] = $html->find('ul.sellerInformation img', 4)->getAttribute('alt');
$ret['Retail6'] = $html->find('span[class="price"]', 5)->innertext;
$ret['SoldBy6'] = $html->find('ul.sellerInformation img', 5)->getAttribute('alt');
$ret['Retail7'] = $html->find('span[class="price"]', 6)->innertext;
$ret['SoldBy7'] = $html->find('ul.sellerInformation img', 6)->getAttribute('alt');

Thank you for any suggestions!


<?php

$url = 'http://www.amazon.com/gp/offer-listing/B0036RNK7O/ref=dp_olp_new?ie=UTF8&qid=1319582305&sr=8-2';

$dom = new DomDocument();

$content = file_get_contents($url);
$dom->loadHTML($content);

$results = array();
$classes_to_collect = array('price', 'shipping_block', 'condition', 'sellerInformation');
$seller_elements = array('name', 'rating', 'stock_info', 'item_info');

foreach($dom->getElementsByTagName('tbody') as $tb)
{
  if($tb->hasAttribute('class') && stripos($tb->getAttribute('class'), 'result')!==false)
  {
    foreach($tb->getElementsByTagName('tr') as $tr)
    {
      $new_result = array();
      foreach($tr->getElementsByTagName('td') as $td)
      {
        foreach($td->childNodes as $cne)
        {
          foreach($classes_to_collect as $ctc)
          {
            if($cne->hasAttributes() && $cne->getAttribute('class') && stripos($cne->getAttribute('class'), $ctc)!==false)
            {
              if($cne->localName=='ul')
              {
                $new_sellern = array();
                $lis = $cne->getElementsByTagName('li');
                foreach($lis as $lii=>$lie)
                {
                  $value = $lie->textContent;
                  if($seller_elements[$lii]=='item_info')
                  {
                    $cutoff = strpos($value, 'amznJQ.onReady');
                    if($cutoff) $value = substr($value, 0, $cutoff);
                  }
                  else if($seller_elements[$lii]=='name')
                  {
                    $cutoff = strpos($value, 'Seller:');
                    if($cutoff!==false) $value = substr($value, 7);
                  }
                  else if($seller_elements[$lii]=='rating')
                  {
                    $cutoff = strpos($value, 'Seller Rating:');
                    if($cutoff!==false) $value = substr($value, 14);
                  }
                  $new_seller[$seller_elements[$lii]] = trim($value);
                }
                $new_result[$ctc] = $new_seller;
              }
              else $new_result[$ctc] = $cne->textContent;
            }
          }
        }
      }
      $results[] = $new_result;
    }
  }
}

print_r($results);

Will print a huge multi-dimensional array


I used a foreach and put the results into an array. Worked much better since the number of sellers varies by item.

foreach($html->find('div.resultsset table tbody.result tr') as $article) {  
if($article->find('span.price', 0)) {   
// get retail   
$item['Retail'] = $article->find('span.price', 0)->plaintext;   
// get soldby   
if($article->find('img', 0)->getAttribute('alt') <> '') { 
$item['SoldBy'] = $article->find('img', 0)->getAttribute('alt'); } 
else {$item['SoldBy'] = $article->find('ul.sellerInformation li a b', 0)->plaintext;} 
$ret[] = $item;  
 } 
}  
0

精彩评论

暂无评论...
验证码 换一张
取 消