I notice 开发者_高级运维that some sites are coping the content of one of my client's sites using automated agents. I want to detect their requests and show them a captcha code to prevent them from coping the site content.
Is there anyway to detect them?
This is a complex problem and a game of cat and mouse. To make it slightly difficult:
- Ban the IPs that are hitting the site repeatedly, a normal user would not need ALL the pages
- Ban public proxies, list is available on googleing
- Any request from banned IPs/Proxies should be redirected to captcha page
Typically an "automated agent" would be accessing a lot of data in a short period...more than a typical user. You would need to setup something to track ip addresses of all users and see if there is any such ip that stands out and block them.
Of course, this is made more difficult as there are proxies and dynamic ips etc...
精彩评论