What prevents the fixing of the Insert-whatever-you-want-into-the-URL issue that some content management systems have?_问答_开发者

I don't understand what real-world issues prevent a system from di开发者_如何学Gosallowing these kinds of URLs?

http://www.washingtonpost.com/hey-this-url-doesn't-mean-a-damn-thing/gIQAocHrpJ_story.html

I understand what's going on. The routing system looks for the key after the final backslash. And then it parses out what's after the underscore to build out the version.

So: washingtonpost.com/whatever/gIQAocHrpJ_story.html brings us the normal story version washingtonpost.com/whatever/gIQAocHrpJ_print.html brings us the normal print version washingtonpost.com/whatever/gIQAocHrpJ_mobile.html brings us the mobile xml version

Strangely, even changing that .html to another common extension, like .js or .xml or nothing at all, brings you back the same page. However, changing it to something non-standard, like .fffuuu alternatively brings you a human-friendly 404 page or a total blank page. It's like the CMS programmer just whitelisted the first few filetypes that came to mind and had the system treat them all the same.

I've only built simple sites in Rails and Wordpress, so I understand simple concepts about url patterns, such as how prefix constants can affect the lookup speed...but am I wrong in thinking that there is no rhyme or reason to the above design pattern?

Mind you, the Washington Post just recently completed a major redesign. This isn't about trying to make do with a legacy system, their CMS designers apparently had the freedom to adopt modern best practices. I just don't see the advantages of the url-design-pattern that they've adopted, except that the CMS designer doesn't know any better.

How is their current system any faster than a database model that has a unique key and then a human-readable field?

http://www.washingtonpost.com/HUMAN-READABLE-KEY/UNIQUE_KEY.html

The pattern in between the domain backslash and the final backslash is the human readable key. The system finds a record with the UNIQUE_KEY and then sees if the human-readable-key matches what the DB has for that record.

I noticed that in the official version of the links, as they are generated from the homepage, include year/month/day information. Again, it's meaningless, as you can alter those and get the same page (thankfully, no JS seems to depend on parsing those).

I'm guessing the CMS designer didn't want to be bound by dates, as a news story could break on 8/20/2011 but the print version goes live on 8/21/2011...Sure, then just don't have dates at all in the URL. If the URL can be changed to anything, then don't train the user to expect document-specific information in it.

Not even the first term after the domain means anything. Therefore:

http://www.washingtonpost.com/politics/mitt-romney-debates-us-economy/gIQAocHrpJ_story.html

Goes to the same story as

http://www.washingtonpost.com/sex/mitt-romney-debates-us-economy/gIQAocHrpJ_story.html

And finally, doesn't this play havoc with Google and other search engines?

The key reason this is done is to make sure that if the headline changes readers can still get to the story. The "slug" (what you call the human readable key: mitt-romney-debates-us-economy) is usually auto-generated from the page's headline or title text. In some older CMSes, where this wasn't well thought out, changing the headline often left the URL the same (with the old slug in it). As you can imagine, at times, when the original headline was ill-chosen, this could be quite embarrassing.

As a result, most CMS developers switched to looking up the story based on an ID, which it's much easier to make sure doesn't change. But then what to do with the slug? Some CMSes just ignore it; that's the Washington Post's approach.

Another (pretty easy and probably better) solution is: When you find your story in the database, make sure the URL's slug matches the story's current slug in the database (based on the current headline). If it doesn't, redirect the user to the correct URL. From the end user's perspective, it's seamless: You type in http://www.washingtonpost.com/hey-this-url-doesn't-mean-a-damn-thing/gIQAocHrpJ_story.html and when the page is done loading you're at http://www.washingtonpost.com/politics/mitt-romney-debates-us-economy/gIQAocHrpJ_story.html

Why the Washington Post isn't doing that, I'm not sure; they have lots of smart people there, so there's probably some excellent technical reason linked to their particular CMS (which I would guess is based on something they bought from a vendor). In other systems, the solution I've described can be done very easily (in Django, I've done it in three lines).