开发者

Simple HTML DOM, finding links recurssively

开发者 https://www.devze.com 2023-02-07 23:42 出处:网络
I am using simple html dom to find links on a certain page using: // Find all links foreach($html->find(\'a\') as $element)

I am using simple html dom to find links on a certain page using:

// Find all links
foreach($html->find('a') as $element)
       echo $element->href . '<br>'; 

This find all the links on the page, however i want to be able to go t开发者_C百科o found links as well and find links inside those found links recursively for example to level 5.

Any idea of how to go about?


Use a recursive function and keep track of the depth:

function findLinks($url, $depth, $maxDepth) {
  // fetch $url and parse it
  // ...
  if ($depth <= $maxDepth)
    foreach($html->find('a') as $element)
      findLinks($element->href, $depth + 1, $maxDepth);
}

And you would start by calling something like findLinks($rootUrl, 1, 5).


In the past I did need a similar feature. What you can do is use mysql to store your links.

In my case I had a todo table and a pages table. Seed your todo table with some url's you want to spider.

What I used to do was to get the page info I need (plaintext and title) and store this in a mysql db pages. Then I used to loop through the links and add them to the todo table. The last step was to remove the current page from my todo list then loop over..

grab a url from todo loop 
{ 
   get current page title and plaintext store it in pages table
   loop through links Add found links to todo table
   remove current page from todo 
}
0

精彩评论

暂无评论...
验证码 换一张
取 消