开发者

How to generate 8 bytes unique id from GUID?

开发者 https://www.devze.com 2023-02-26 02:11 出处:网络
I try to use long as unique id within our C# application (not global, and only for one session) for our events. Do you know if the following will generate an unique long id?

I try to use long as unique id within our C# application (not global, and only for one session) for our events. Do you know if the following will generate an unique long id?

public long GenerateId()
{
 byte[] b开发者_如何学JAVAuffer = Guid.NewGuid().ToByteArray();
 return BitConverter.ToInt64(buffer, 0);
}

Why we not use GUID directly? We think 8 bytes long is good enough.


No, it won't. As highlighted many times on Raymond Chen's blog, the GUID is designed to be unique as a whole, if you cut out just a piece of it (e.g. taking only 64 bytes out of its 128) it will lose its (pseudo-)uniqueness guarantees.


Here it is:

A customer needed to generate an 8-byte unique value, and their initial idea was to generate a GUID and throw away the second half, keeping the first eight bytes. They wanted to know if this was a good idea.

No, it's not a good idea. (...) Once you see how it all works, it's clear that you can't just throw away part of the GUID since all the parts (well, except for the fixed parts) work together to establish the uniqueness. If you take any of the three parts away, the algorithm falls apart. In particular, keeping just the first eight bytes (64 bits) gives you the timestamp and four constant bits; in other words, all you have is a timestamp, not a GUID.

Since it's just a timestamp, you can have collisions. If two computers generate one of these "truncated GUIDs" at the same time, they will generate the same result. Or if the system clock goes backward in time due to a clock reset, you'll start regenerating GUIDs that you had generated the first time it was that time.


I try to use long as unique id within our C# application (not global, and only for one session.) for our events. do you know the following will generate an unique long id?

Why don't you just use a counter?


You cannot distill a 16-bit value down to an 8-bit value while still retaining the same degree of uniqueness. If uniqueness is critical, don't "roll your own" anything. Stick with GUIDs unless you really know what you're doing.

If a relatively naive implementation of uniqueness is sufficient then it's still better to generate your own IDs rather than derive them from GUIDs. The following code snippet is extracted from a "Locally Unique Identifier" class I find myself using fairly often. It makes it easy to define both the length and the range of characters output.

using System.Security.Cryptography;
using System.Text;

public class LUID
{
    private static readonly RNGCryptoServiceProvider RandomGenerator = new RNGCryptoServiceProvider();
    private static readonly char[] ValidCharacters = "ABCDEFGHJKLMNPQRSTUVWXYZ23456789".ToCharArray();
    public const int DefaultLength = 6;
    private static int counter = 0;

    public static string Generate(int length = DefaultLength)
    {
        var randomData = new byte[length];
        RandomGenerator.GetNonZeroBytes(randomData);

        var result = new StringBuilder(DefaultLength);
        foreach (var value in randomData)
        {
            counter = (counter + value) % (ValidCharacters.Length - 1);
            result.Append(ValidCharacters[counter]);
        }
        return result.ToString();
    }
}

In this instance it excludes 1 (one), I (i), 0 (zero) and O (o) for the sake of unambiguous human-readable output.

To determine just how effectively 'unique' your particular combination of valid characters and ID length are, the math is simple enough but it's still nice to have a 'code proof' of sorts (Xunit):

    [Fact]
    public void Does_not_generate_collisions_within_reasonable_number_of_iterations()
    {
        var ids = new HashSet<string>();
        var minimumAcceptibleIterations = 10000;
        for (int i = 0; i < minimumAcceptibleIterations; i++)
        {
            var result = LUID.Generate();
            Assert.True(!ids.Contains(result), $"Collision on run {i} with ID '{result}'");
            ids.Add(result);
        }            
    }


No, it won't. A GUID has 128 bit length, a long only 64 bit, you are missing 64 bit of information, allowing for two GUIDs to generate the same long representation. While the chance is pretty slim, it is there.


Per the Guid.NewGuid MSDN page,

The chance that the value of the new Guid will be all zeros or equal to any other Guid is very low.

So, your method may produce a unique ID, but it's not guaranteed.


Yes, this will be most likely unique but since the number of bits are less than GUID, the chance of duplicate is more than a GUID - although still negligible.

Anyway, GUID itself does not guarantee uniqueness.


var s = Guid.NewGuid().ToString();
var h1 = s.Substring(0, s.Length / 2).GetHashCode(); // first half of Guid
var h2 = s.Substring(s.Length / 2).GetHashCode(); // second half of Guid
var result = (uint) h1 | (ulong) h2 << 32; // unique 8-byte long
var bytes = BitConverter.GetBytes(result);

P. S. It's very good, guys, that you are chatting with topic starter here. But what about answers that need other users, like me???


Like a few others have said, only taking part of the guid is a good way to ruin its uniqueness. Try something like this:

var bytes = new byte[8];
using (var rng = new RNGCryptoServiceProvider())
{
    rng.GetBytes(bytes);
}

Console.WriteLine(BitConverter.ToInt64(bytes, 0));


enerates an 8-byte Ascii85 identifier based on the current timestamp in seconds. Guaranteed unique for each second. 85% chance of no collisions for 5 generated Ids within the same second.

private static readonly Random Random = new Random();
public static string GenerateIdentifier()
{
    var seconds = (int) DateTime.Now.Subtract(new DateTime(1970, 1, 1, 0, 0, 0)).TotalSeconds;
    var timeBytes = BitConverter.GetBytes(seconds);
    var randomBytes = new byte[2];
    Random.NextBytes(randomBytes);
    var bytes = new byte[timeBytes.Length + randomBytes.Length];
    System.Buffer.BlockCopy(timeBytes, 0, bytes, 0, timeBytes.Length);
    System.Buffer.BlockCopy(randomBytes, 0, bytes, timeBytes.Length, randomBytes.Length);
    return Ascii85.Encode(bytes);
}


As already said in most of the other answers: No, you can not just take a part of a GUID without losing the uniqueness.

If you need something that's shorter and still unique, read this blog post by Jeff Atwood:
Equipping our ASCII Armor

He shows multiple ways how to shorten a GUID without losing information. The shortest is 20 bytes (with ASCII85 encoding).

Yes, this is much longer than the 8 bytes you wanted, but it's a "real" unique GUID...while all attempts to cram something into 8 bytes most likely won't be truly unique.


In most cases bitwise XOR of both halves together is enough


Everyone in here is making this way more complicated than it needs to be. This is a terrible idea.

GUID 1: AAAA-BBBB-CCCC-DDDD
GUID 2: AAAA-BBBB-EEEE-FFFF

throw away the second half of each GUID, and now you have a duplicate identifier. GUIDs are not guaranteed to be unique, and its extremely awful. you shouldn't rely on the gurantee of whats generated, and it's not hard to get around this. If you need unique identifiers for an object, entity, or whatever, lets take a database for example - which is the most common, you should generate an id, see if it already exists, and insert it only if it doesn't. this is fast in databases since most tables are indexed based on ID. "most." if you have some kind of small object list in memory, or wherever, you'd probably store the entity in a hash table of some kind, in which you could just look it up to see if that generated GUID already exists.

all in all, depends on what your use case is really. a database, find the GUID first, and regenerate if possible until you can insert the new item. this really only matters in relational databases who dont automatically generate IDs for items in the tables. NoSQL DB's usually generate a unique identifier.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号