URL Rewriting, how to avoid URL embarrassment?_问答_开发者

URL Rewriting, how to avoid URL embarrassment?

开发者 https://www.devze.com 2023-02-27 02:18 出处：网络

A lot of sites implement URL Rewriting based on taking a url similar to /news/ArticleID/Some-Title-Text-Goes-Here/

A lot of sites implement URL Rewriting based on taking a url similar to

/news/ArticleID/Some-Title-Text-Goes-Here/

Apply a rewrite rule along the lines of

Rewrite /news/([0-9]*)/.* /news/article.lang?ArticleID=$1

/news/123/Lorem-Ipsum/

is rewritten to

/news/article.lang?ArticleID=123

As all this cares about is the article id, the title text can be anything.

I've written plenty of rules like this in the past without considering that a potential problem until this morning when a major UK newspaper was embarrassed based on this behaviour.

The article here

http://www.independent.co.uk/life-style/food-and-drink/kate-middleton-jelly-bean-expected-to-fetch-500-2269573.html

had its URL modified to

http://www.independent.co.uk/life-style/food-and-drink/utter-PR-fiction-but-people-love-this-shit-so-fuck-it-lets-just-print-2269573.html

This Modified URL was posted to twitter and promptly went viral, causing a lot of embarrassment for the newspape开发者_JAVA百科r in question.

What is the best way to prevent this happening/mitigate the effects without losing the benefits of the url rewrite?

(I note Stack Overflow questions throws a 301 to the correct URL if you modify its URL, is that obvious enough for most users or should we have a current canonical URL and a list of priors with the priors 301'ing to the canonical and all others 404'ing?)

In the script that displays the article, check that the requested URI matches the pretty hyphenated title as computed from the article title in the database. If it doesn't match, do something like a 404.

For example, if you have the article's real title in $article['title'], have the title part of the requested URI parsed as $requested_title and pretty_for_uri($input) turns a string into a URI-friendly, hyphenated string, you'd want to check that

$requested_title == pretty_for_uri($article['title'])

The above trick works because it ends in <article-id>.html and the part between the section of the paper and the id is ignored. Just try

http://www.independent.co.uk/life-style/food-and-drink/foo-2269573.html

which works as well.

The bad url did not come from a url-shortener, but rather from a brain-dead url-expander and URL scheme that the independent has.

A real url-shortener should create something like (what you wrote) /news/article.lang?ArticleID=123and then check that the entered url follows that form.