开发者

How to write this crawler in JavaScript?

开发者 https://www.devze.com 2023-01-18 11:41 出处:网络
The idea is very simple: Imagine a simple white page with a form with a single input tag (like Google homepage ).

The idea is very simple:

Imagine a simple white page with a form with a single input tag (like Google homepage ). When I insert a link of a blog post in this form, then the javascript-crawler search the first image in the web page of the blog post (through ajax), show it in the white page and save it on my server.

This crawler works like Digg and Facebook-wall.

What function I have to u开发者_如何学Cse for this crawler?


Due to cross cross domain restrictions pure javascript crawlers are not common and practically feasible. You might need to setup a server side script which will receive the address entered in the form, fetch the contents of the remote resource and parse the html to obtain the images.


Darin is right, javascript cannot request content from another domain. But it can dynamically add script tags to document and includes some scripts from other domains. (detailed information: jsonp)

I can suggest you to use YQL. You can crawl every page that you want with Yahoo's YQL library by coding only Javascript. Yahoo servers fetchs urls that you requested, parses HTML and sends you requested part of documents.

0

精彩评论

暂无评论...
验证码 换一张
取 消