开发者

how to crowd source my web crawling

开发者 https://www.devze.com 2022-12-09 03:03 出处:网络
My web application requires downloading content from the开发者_JAVA技巧 user URL specified. Currently this request go through my server, which is inefficient and could get my server IP blocked.

My web application requires downloading content from the开发者_JAVA技巧 user URL specified. Currently this request go through my server, which is inefficient and could get my server IP blocked.

Is there a way to let the user download the URL content directly? The same-origin policy seems to prevent using AJAX or an iframe to download and reuse this content.

Any ideas? For example is there a way via flash to download and reuse URL content?


You could use Tor to mask your requests, but if you're having to go such lengths to crawl a website perhaps you shouldn't be doing it?

Also, with your approach the iframe request will include your page URL as the referrer, which makes identifying these requests at the server end pretty straightforward...


If it's a specific web side, I recommend to talk to the website operators rather than trying to crawl anonymously.

0

精彩评论

暂无评论...
验证码 换一张
取 消