开发者

How to retrieve google pages

开发者 https://www.devze.com 2022-12-13 15:35 出处:网络
Dear all,I am now using a webtool http://fiddesktop.cs.northwestern.edu/mmp/scrape?url= to parse a webpage.

Dear all,I am now using a webtool

http://fiddesktop.cs.northwestern.edu/mmp/scrape?url=

to parse a webpage.

For example,we can parse newyork开发者_C百科times homepage,we do:

http://fiddesktop.cs.northwestern.edu/mmp/scrape?url=http://www.nytimes.com/pages/world/index.html

in the address bar of our browser,it will parse things nicely for us.

However,it just fails for google pages. For example,if I want to parse Google news headpage,like:

http://fiddesktop.cs.northwestern.edu/mmp/scrape?url=http://news.google.com/nwshp?hl=en&tab=wn

I will always get 500 Internal Server Error.

I am sure that is somthing to do with google website,I think probably we need some API for google,does anyone have any idea how to to sort this out for google pages? Many thanks.


Per the google.com robots.txt file, you are explictly requested not to scrape their content. Google does not provide an API for machine-readable search results; they want to control the presentation of their content via widgets and embedding strategies.

0

精彩评论

暂无评论...
验证码 换一张
取 消