开发者

How do I make a short unique ID for each post in my database?

开发者 https://www.devze.com 2023-01-27 13:40 出处:网络
mydomain.com/show/?id=sf32JFSVANMfaskjfh Usually I just generate a random string that\'s 25 characters long and access my post that way.But in today\'s word, short URLs are necessary.
mydomain.com/show/?id=sf32JFSVANMfaskjfh

Usually I just generate a random string that's 25 characters long and access my post that way. But in today's word, short URLs are necessary.

If I want 3-5 letters for the ID...开发者_StackOverflow中文版I can't just generate random characters. It'll conflict sometime.

What do I do?


PHP:

Running:

rand_uniqid(9007199254740989);

will return 'PpQXn7COf' and:

rand_uniqid('PpQXn7COf', true);

will return '9007199254740989'


If you want the rand_uniqid to be at least 6 letter long, use the $pad_up = 6 argument


You can support even more characters (making the resulting rand_uniqid even smaller) by adding characters to the $index var at the top of the function body.


<?php
function rand_uniqid($in, $to_num = false, $pad_up = false, $passKey = null)
{
    $index = "abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";
    if ($passKey !== null) {
        // Although this function's purpose is to just make the
        // ID short - and not so much secure,
        // you can optionally supply a password to make it harder
        // to calculate the corresponding numeric ID

        for ($n = 0; $n<strlen($index); $n++) {
            $i[] = substr( $index,$n ,1);
        }

        $passhash = hash('sha256',$passKey);
        $passhash = (strlen($passhash) < strlen($index))
            ? hash('sha512',$passKey)
            : $passhash;

        for ($n=0; $n < strlen($index); $n++) {
            $p[] =  substr($passhash, $n ,1);
        }

        array_multisort($p,  SORT_DESC, $i);
        $index = implode($i);
    }

    $base  = strlen($index);

    if ($to_num) {
        // Digital number  <<--  alphabet letter code
        $in  = strrev($in);
        $out = 0;
        $len = strlen($in) - 1;
        for ($t = 0; $t <= $len; $t++) {
            $bcpow = bcpow($base, $len - $t);
            $out   = $out + strpos($index, substr($in, $t, 1)) * $bcpow;
        }

        if (is_numeric($pad_up)) {
            $pad_up--;
            if ($pad_up > 0) {
                $out -= pow($base, $pad_up);
            }
        }
        $out = sprintf('%F', $out);
        $out = substr($out, 0, strpos($out, '.'));
    } else {
        // Digital number  -->>  alphabet letter code
        if (is_numeric($pad_up)) {
            $pad_up--;
            if ($pad_up > 0) {
                $in += pow($base, $pad_up);
            }
        }

        $out = "";
        for ($t = floor(log($in, $base)); $t >= 0; $t--) {
            $bcp = bcpow($base, $t);
            $a   = floor($in / $bcp) % $base;
            $out = $out . substr($index, $a, 1);
            $in  = $in - ($a * $bcp);
        }
        $out = strrev($out); // reverse
    }

    return $out;
}

echo rand_uniqid(1);
?>

PostgreSQL:

<?php 

CREATE OR REPLACE FUNCTION string_to_bits(input_text TEXT) 
RETURNS TEXT AS $$
DECLARE
    output_text TEXT;
    i INTEGER;
BEGIN
    output_text := '';


    FOR i IN 1..char_length(input_text) LOOP
        output_text := output_text || ascii(substring(input_text FROM i FOR 1))::bit(8);
    END LOOP;


    return output_text;
END;
$$ LANGUAGE plpgsql; 


CREATE OR REPLACE FUNCTION id_to_sid(id INTEGER) 
RETURNS TEXT AS $$
DECLARE
    output_text TEXT;
    i INTEGER;
    index TEXT[];
    bits TEXT;
    bit_array TEXT[];
    input_text TEXT;
BEGIN
    input_text := id::TEXT;
    output_text := '';
    index := string_to_array('0,d,A,3,E,z,W,m,D,S,Q,l,K,s,P,b,N,c,f,j,5,I,t,C,i,y,o,G,2,r,x,h,V,J,k,-,T,w,H,L,9,e,u,X,p,U,a,O,v,4,R,B,q,M,n,g,1,F,6,Y,_,8,7,Z', ',');

    bits := string_to_bits(input_text);

    IF length(bits) % 6 <> 0 THEN
        bits := rpad(bits, length(bits) + 6 - (length(bits) % 6), '0');
    END IF;

    FOR i IN 1..((length(bits) / 6)) LOOP
        IF i = 1 THEN
            bit_array[i] := substring(bits FROM 1 FOR 6);
        ELSE
            bit_array[i] := substring(bits FROM 1 + (i - 1) * 6 FOR 6);
        END IF;

        output_text := output_text || index[bit_array[i]::bit(6)::integer + 1];
    END LOOP;


    return output_text;
END;
$$ LANGUAGE plpgsql; 

 ?>

JavaScript:

<script>
/*jslint white: true, browser: true, onevar: true, undef: true, nomen: true, eqeqeq: true, newcap: true, immed: true */

/*global Crypto:true */

if (typeof Crypto === 'undefined') {
    Crypto = {};
}

Crypto.random = (function () {
    var index = [
        'b', 'c', 'd', 'f', 'g', 'h', 'j', 'k', 'l', 'm',
        'n', 'p', 'q', 'r', 't', 'v', 'w', 'x', 'y', 'z',
        '_', '-', '0', '1', '2', '3', '4', '5', '6', '7',
        '8', '9', 'B', 'C', 'D', 'F', 'G', 'H', 'J', 'K',
        'L', 'M', 'N', 'P', 'Q', 'R', 'T', 'V', 'W', 'X',
        'Y', 'Z'
    ], base = index.length;

    return {
        encode: function (i) {
            var out = [],
                t = Math.floor(Math.log(i) / Math.log(base)),
                bcp,
                a;

            while (t >= 0) {
                bcp = Math.pow(base, t);
                a = Math.floor(i / bcp) % base;
                out[out.length] = index[a];
                i -= a * bcp;

                t -= 1;
            }

            return out.reverse().join('');
        },
        decode: function (i) {
            var chars = i.split(''),
                out = 0,
                el;

            while (typeof (el = chars.pop()) !== 'undefined') {
                out += index.indexOf(el) * Math.pow(base, chars.length);
            }

            return out;
        }
    };
}());
</script>

Example:

<script>
alert(Crypto.random.encode(101010101));
alert(Crypto.random.decode('XMzNr'));
</script>


Why not just use an autoincremented number such as 5836? Every time a new row is inserted, that column will be incremented by one. For example, if the newest row is 5836, the next row will be 5837 and so on.

Just use an INT type column with a length of 15. Or if it's not a type of row that will be added by a regular member, use a small int or medium int type.


if each of your posts have an id already, and it is numeric, you can just encode them using an arbitrary base. think kinda like hex but with larger numbers..

check out this url by leah culver...

http://blog.leahculver.com/2008/06/tiny-urls-based-on-pk.html

for some more ideas. I've used this in the past and it works well. In leah's post it is base 56, so just take your primary key (Integer) and encode it into your new base 56, and you are all set.


As a general rule, shorter hashes will lead to a higher rate of conflicts. Longer hashes can also lead to conflicts, but the probability of such becomes less.

If you want shorter hashes, you should implement a conflict resolution strategy.


short URLs are necessary

Really? Why? Are you planning to have your users type them in manually?

I would think that, since they're almost certainly going to be links from somewhere else, the size of the URL is mostly irrelevant.

Users who worry about the size of the URL that was used to get to a post are wasting their time. By all means, encode a large integer into base64 or something like that if you wish, but I personally believe it's a waste of time, as in "there are probably things you could be doing that would have a greater return on investment".


Re your comment on Twitter, I'd just allocate sequential 25-character (or shorter if you wish) IDs for the actual posts as you do now, then use a shortened version for where you need less of a URL. For example, the last 4 characters of that 25-character ID.

Then map that 4-character ID to the most recent equivalent 25-character ID (meaning there are two URLs (short and long) which can get to that post). That means your messages will be valid until you roll over but that still gives you a huge number of active ones (at base64, 644 or over sixteen million messages).

And the full-size URL will be able to get to the post (effectively, since 25 characters gives you about 1045 messages) forever.


the NOID: Nice Opaque Identifier (Minter and Name Resolver) was meant to do this for library systems

but it's basic design is that it makes a id and checked if it's been used if not then it's available for use generating in the background and distributing ids to the users can avoid the overhead of creating them


I'd say you have to see the external side and the internal side. You can not have unique numbers with "just 3 " letters, or at least just for a short time. So you can use internally long identfiers one which come to my mind would be e.g using UUIDS, then you have to figure out how to make URLS with such UUIDS pretty and or readable. E.g If we look at posts something like YYYY.mm.dd.nnn may do. I suggest checking out the REST approach...


AFAIK most languages have their own method of doing this. For instance in PHP, you could use the built in function uniqid(). With this method however, you also have to save the generated uniqid in your database, so you can use it in your "where" clause in SQL. There is no way of decrypting these so called GUID's as far as I am aware, and as such they don't tell you anything about when this was made or anything.

<?php
//For a basic unique id based on the current microtime:
$uniq_id = uniqid();
//If you want it to be even more uniq, you can play around with rand().
function rlyUniqId(){
    $letters = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghyjklmnopqursuvwxyz1234567890";
    $letter_count = strlen($letters);
    $prefix_letter_count = rand(1,4);
    for($i=1;$i<=$prefix_letter_count;$i++){
        $letter_pos = rand(0,$letter_count)-1;
        $prefix .= substr($letters,$letter_pos);
    }
    return uniqid($prefix);
}
$rly_uniq_id = rlyUniqId();
?>

Another suggestion mentioned earlier here, is using the standard generated auto-incremented id, and then base64 encoding it in your URL so it will look fancy. This can be decoded again, which means you just decode the id in the url to call on the id from your database. On the other hand, anyone can do this, so if it's to hide the actual id, this way is by no means ideal. Also, sometimes this method is a bit annoying if you sometimes have to handcode a link. On the upside, it gives you way shorter url's than the earlier mentioned uniqid().

<?php
$data_from_db = array("id"=>14,"title"=>"Some data we got here, huh?");
$url_id = base64_encode($data_from_db["id"]);
echo '<a href="readmore.php?id='.$url_id.'>'.$data_from_db["title"].'</a>';
//when reading the link, simply do:
$original_id = base64_decode($url_id);
?>

I know this is a bit late, but I hope this will some day help someone somewhere :P GL HF!

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号