开发者

web scraping search results

开发者 https://www.devze.com 2023-03-22 22:46 出处:网络
I need help solving the following issue: I need to validate cached URLs by Google search engine for a particular site. In the case the url will 404 or the page will not render some necessary html ele

I need help solving the following issue:

I need to validate cached URLs by Google search engine for a particular site. In the case the url will 404 or the page will not render some necessary html elements (considered broken) I need to log those URLs and later 301 redirect to correct URLs. I know PHP and a little 开发者_开发问答bit of Python but I'm not sure what approach to use to scrap all URLs from search engine results for given site.


http://simplehtmldom.sourceforge.net/ - a simple html parser. there is an example at this page; not sure if this still works with googles instant search etc.

0

精彩评论

暂无评论...
验证码 换一张
取 消