Just as a fun project I wanted to try and make a simple URL shortener for my own personal use but I wanted to try and incorporate things that I liked from other shorteners like bit.ly and such. So I've come to a snag when it comes to assigning short URL IDs.
Right now I just manually assign the code but I would like to automate it. I could do it the easy way by just assigning incrementing IDs (I thought this could be done using an assigned auto increment value on the MySQL database and just use the PHP dechex()
function for the URL) but it seems that other shorteners are random.
I know that I won't get an absurd num开发者_如何学Pythonber of URLs in the database but I still want to keep the process efficient which makes creating random unique IDs rather taxing with many URLs in the database. I don't really have any idea about how to go about making a system to make the IDs that doesn't make duplicates and doesn't run slowly.
See: PHP short hash like URL-shortening websites and the answer you might want: http://blog.kevburnsjr.com/php-unique-hash
The second link might be particularly useful, just short-hash the current ID.
Use one of the common hash functions, like MD5 or SHA-1 to take the hash of your URL, print it as hexidecimal format, and take the last 8 characters (or the first 8 characters). This has the advantage that you can always determine whether the URL has already been submitted.
You can always generate random IDs, check if they have already been assigned, and draw a new one in the unlikely event where you hit one that had already been used. The lookup to see if they were already assigned should not be really slow since you're going to do it each time someone queries one of your URLs anyway.
If you want random hex strings, a quick and dirty way is to generate a random large number, hash it using sha1 or any other hash function, and take the first 8 chars. I don't really see why you would want hex rather than random base64 though, as base64 would allow you to pack more URLs into less characters. [Actually, you might want to generate IDs by hashing the URLs--should be as good as hashing random values if you use a secure crypto hash, and it would ensure that the same URL always gets the same key, preventing duplicates.]
Don't forget to start generating longer IDs once you hit a predefined number (or collide too often), as you wouldn't like to have the thing go slow as you run out of IDs and get a lot of collisions.
If you want nice theoretical guarantees about collision probabilities and all this sort of stuff, there are a lot of it, depending on the hashing scheme that you use.
Oh, and just on a side note, there are indeed some URL shorteners using sequential IDs, like http://lilurl.sourceforge.net/. I think the main reason that it's usually avoided is to prevent people with a good sense of timing to associate offensive IDs to URLs of their choosing...
精彩评论