I am working on a project with a group, and we are making an experimental site that involves heavy user interaction. In a nutshell, the nature of the site involv开发者_StackOverflow社区es heavy user posting and commenting. Based on the theme of our site, we are expecting to get controversial posts and most likely offensive material.
My question is what algorithms, methods, etc. we can use to monitor and handle these "bad user" interactions with our website.
Right now, we have really only come up with checking the posts against a database of people, college and business names. This would make the posts anonymous somewhat and would take a sense of offense out of the post. What else should/can we implement into our design that will accomplish this?
Solution:
Everybody had really good suggestions that I'm going to research a little more. In reference to the making a list, I have been experimenting with a small script I wrote that is taking a collection of websites which contain directories of names with a substantial amount of data(3000-4000 names), and I am parsing the HTML, and storing each value in a database to be ran against the user posts. This is a little "makeshift" but it will serve as a good tester for the time being.
For some good background to the problem, with some general suggestions, check out this transcript of a speech by Clay Shirky: A group is its own worst enemy
To steal directly from the StackOverflow podcast, rate limiting is one of the most effective methods. Put reasonable limits on how much time may elapse between comments, and if the limit is exceeded, put that user into a temporary "cool-off" period where they can't interact for a few minutes. If they keep bouncing against this limit, you may have a pathological abuser, and might cool them off for longer, ask them nicely to refrain, etc.
Rate limiting will reduce flaming because one of the primary contributors to flame wars is people get angry and start posting personal attacks rather than rational arguments. Rate limiting will reduce this behavior somewhat.
Allowing people to flag offensive material is also valuable (and only allow each user to flag an item once), but I would only show flagged items to moderators where there is a fairly high rate of flagging. You need to filter out the "background noise" because almost anything you post is going to offend someone.
Depends on how many users, how much tolerance for silliness (is it OK for an offensive post to be on there for a little while), etc.
One possibility would be to require users to create user accounts (suitably CAPTCHAed to prevent automated account creation) before they can post. Then delete offensive posts (and the corresponding accounts) as necessary.
There are different ways to identify offensive posts. One standard 2.0 technique is to let users flag each others' posts as offensive. This can make it easier for admins to capture.
To stop angry people, I'm a huge fan of the "Flag this post" link. Your community will do most of the moderation for you.
To stop reasonable people who wrote something inflammatory, you can try being clever. Make a long list of really strong words (curse words being the strongest, obviously) and score each appropriately. If a post's word strength score (adjusted for post word count) crosses a threshold, display a big red warning, and suggest that the poster consider rewording. And if they hit submit anyways, go ahead and put that into the moderation queue instead of posting immediately.
To stop spammers, I'm a huge fan of the cryptographic nonce + hashing function performed in javascript + cookie replay technique. No visual space for an ugly captcha required, and equivalent performance in practice. I've yet to see a spammer go through the hurdles required to defeat it in an automated way. I have seen confused spammers enter spam manually by hand after their automated systems get rejected with 100.0% accuracy though.
And totally read that Clay Shirky link from the other answer. Understanding community dynamics is key.
Addenda: Implementing a non-interactive CAPTCHA.
Make an AJAX query for a nonce to the server. The server sends back a JSON response containing the nonce, and also sets a cookie containing the nonce value. Calculate the SHA1 hash of the nonce in javascript, copy the value into a hidden field. When the user POSTs the form, they now send the cookie back with the nonce value. Calculate the SHA1 hash of the nonce from the cookie, compare to the value in the hidden field, and verify that you generated that nonce in the last 15 minutes (memcached is good for this). If all those checks pass, post the comment.
This technique requires that the spammer sits down and figures out what's going on, and once they do, they still have to fire off multiple requests and maintain state to get a comment through. This is far, far more work than most spammers are willing to go through, especially since the work only applies to a single site. The biggest downside is that anyone with javascript off or cookies disabled gets marked as potential spam. Which means that moderation queues are still a good idea.
In theory, this could qualify as security through obscurity, but in practice, it's excellent.
精彩评论