开发者

PHP Detecting bot-like behavior

开发者 https://www.devze.com 2022-12-12 04:04 出处:网络
I am attempting to build a system that only shows users a CAPTCHA when bot-like behavior is detected. Here are the behaviors that I am currently looking for when somebody is filling out a contact form

I am attempting to build a system that only shows users a CAPTCHA when bot-like behavior is detected. Here are the behaviors that I am currently looking for when somebody is filling out a contact form...

  1. how quickly the form is submitted after the page loads (if its 5 seconds or less, its almost humanely impossible to fill out)

  2. how many contact attempts have been made in the past hour (limit 15/hour), or day (limit 25/day)

  3. check message content for links, and cross-check links against other links recently included i开发者_如何学Gon the past day

  4. check message for spam keywords


I will add useful community solutions here as they come:

  • use a "honeypot" (info at http://haacked.com/archive/2007/09/11/honeypot-captcha.aspx)

  • check referring URL for an outside entrance


What other behaviors would be indicative of robots that PHP could help detect (don't want to use JS because it can be switched off) without the help of a CAPTCHA?


A very simple one (some more advanced bots won't fall for this, but many basic bots will) - put a bogus field in the form that isn't visible to a regular user (and as a backup, perhaps with a normally invisible label "don't type anything here"). If there's content in the field when submitted, chances are it's a bot.


I believe you could coordinate with your robots.txt file, and determine IF it was hit by the user, this would then allow you to keep track of ip/timestamp of requestor, which would make it seem unlikely that a normal user would see your robots.txt file.

As most bots will check your robots.txt file (maybe for dir structure, etc).


An interesting factor could be typing frequency and mouse movements. They are fairly easy to catch via JavaScript. Analyzing them is a different matter, although I imagine it would be fairly easy to calculate deviations and averages that give a good idea how "organic" the movements are.

On the other hand, this is extremely expensive on the client side and can be understood as snooping / spying if detected. Maybe as advanced security for clients that are suspected to be bots?


I added a hidden field (by CSS, display:none) to the form with name="email", when it is filled it was a robot ;)


Perhaps checking the referring url? I can hardly imagine alot of people ending up at a contact form without actually first going through several other pages in a website, same goes for order forms, ...


I'd suggest forget trying to guess the signs...they are always changing.

I'd tokenize every imaginable 'feature' of the behaviour, automatically score the features with either, 'ok', 'spam' or 'unsure'. Then, 'Train on Error' (make a record of the cases where the guess was wrong). After a bit of time you could have 99.7 % accuracy.

Here is an example of the 7 most interesting features of a submission to my site that was scored at 89.9771 % spam. It is spam.

Each of these keywords found in the post are features that are 98.9% likely to be spam:

mssg txt - "tours" || Prob 0.98993 
mssg txt - "cruises" || Prob 0.98993
mssg txt - "agencies" || Prob 0.98993
mssg txt - "choice" || Prob 0.98991 

The telephone number that is '12345' is 95% likely to be spam

tel number - "123456" || Prob 0.95440 Delta 0.45440

The total length of the message being 30 characters (after html removed) is a feature that indicates 94% spam

mssg maxlen - "30" || Prob 0.94600 

(There was another feature that scored Prob 0.01011 which offset the total combined score knocking it down a bit. But, i am not gonna say what that feature was ;o)


It was submitted from a well known spam ip: http://www.projecthoneypot.org/ip_84.19.186.171 but there was no need to use that particular knowledge to mark it out as spam. I gather all sorts of info, like IPs, submissions rates etc ...but, as you can see, the most glaring signs of bot-like behavior are not what you might guess.

To build your own one of these .... read this: http://www.paulgraham.com/spam.html

0

精彩评论

暂无评论...
验证码 换一张
取 消