I've modified a basic web-crawler to gather up a l开发者_JS百科ist of links to a site, which is likely to run into the thousands.The problem I'm having is that the script is timing out once I try and run it through a browser on top of this its been mentioned in a previous question I asked that there also may be an issue with the script running to many processes at the same time killing the server I run it on.
How would I got about fixing these issues or should I go with a open source crawler and if so which crawler should I go with as I can't find anything specific enough,as phpDig site is down :/
previous question
Processes like this are best run as PHP CLI cron jobs.
If you need to be able to run it on demand from a web interface then consider adding it to a queue to be run in the background using Gearman or even the unix at
command.
It so happens that I have written a PHP wrapping class for the linux at
job queue, which is available from my github account should you choose to go down that route.
精彩评论