A lot of sites implement URL Rewriting based on taking a url similar to
/news/ArticleID/Some-Title-Text-Goes-Here/
Apply a rewrite rule along the lines of
Rewrite /news/([0-9]*)/.* /news/article.lang?ArticleID=$1
So
/news/123/Lorem-Ipsum/
is rewritten to
/news/article.lang?ArticleID=123
As all this cares about is the article id, the title text can be anything.
I've written plenty of rules like this in the past without considering that a potential problem until this morning when a major UK newspaper was embarrassed based on this behaviour.
The article here
http://www.independent.co.uk/life-style/food-and-drink/kate-middleton-jelly-bean-expected-to-fetch-500-2269573.html
had its URL modified to
http://www.independent.co.uk/life-style/food-and-drink/utter-PR-fiction-but-people-love-this-shit-so-fuck-it-lets-just-print-2269573.html
This Modified URL was posted to twitter and promptly went viral, causing a lot of embarrassment for the newspape开发者_JAVA百科r in question.
What is the best way to prevent this happening/mitigate the effects without losing the benefits of the url rewrite?
(I note Stack Overflow questions throws a 301 to the correct URL if you modify its URL, is that obvious enough for most users or should we have a current canonical URL and a list of priors with the priors 301'ing to the canonical and all others 404'ing?)
In the script that displays the article, check that the requested URI matches the pretty hyphenated title as computed from the article title in the database. If it doesn't match, do something like a 404.
For example, if you have the article's real title in $article['title']
, have the title part of the requested URI parsed as $requested_title
and pretty_for_uri($input)
turns a string into a URI-friendly, hyphenated string, you'd want to check that
$requested_title == pretty_for_uri($article['title'])
The above trick works because it ends in <article-id>
.html and the part between the section of the paper and the id is ignored. Just try
http://www.independent.co.uk/life-style/food-and-drink/foo-2269573.html
which works as well.
The bad url did not come from a url-shortener, but rather from a brain-dead url-expander and URL scheme that the independent has.
A real url-shortener should create something like (what you wrote) /news/article.lang?ArticleID=123
and then check that the entered url follows that form.
精彩评论