开发者

Sequential UID set generation for MySQL Char() or other Field

开发者 https://www.devze.com 2023-01-25 15:04 出处:网络
Tried Googling but: Question: Best way to externally generate Sequential UID values for a MySQL field which must be representable as a string.

Tried Googling but:

Question: Best way to externally generate Sequential UID values for a MySQL field which must be representable as a string.

Reason:

Generic sequential UUID-ish values for on-disk-order/page-appending inserts for performance of writes and date prefixing for read speed when searching an index of the field from char[0] forward. The column will be indexed, but looking for the best data to inc开发者_JAVA百科rease index read and table write performance rather than a plain-old-UUID.

My initial thought is date to some granularity (possibly padded epoch) appended to or replacing some portion of a UUIDv4 generated string ie [Unix epoch][remaining UUID4] in a fixed-width char field, but I am unsure if this would have the desired in-page/disk ordering result and index-searching result. An example would be:

12904645950049bceba1cc24e80806dd

The values must be independent of MySQL itself, hence using UUIDs and timestamps rather than some variation of auto-incrementing.

Anyone who knows the internals of MySQL indexes have any suggestions (for InnoDB Tables) ?

Aiden


Might be a bit offtopic, but have a look at Twitter's snowflake. They say it's:

  • (Roughly) Time Ordered (helps a lot to avoid expensive random primary key BTREE updates)
  • Directly Sortable
  • Compact

Not to mention other features (HA, etc.). You can either nick their algorithm or just use it as it stands.

The whole UID only uses up to 64 bits of space so I would guess it would be quite effective to index - see http://www.mysqlperformanceblog.com/2006/10/03/long-primary-key-for-innodb-tables/ (a counter example).


I think you may need to be more specific with what you are trying to solve (what's the actual problem - why not auto_increment?, what is your proposed schema?, etc.). To answer your internals question:

  • InnoDB stores data in an index (the clustered index), in 16K pages.

The risks of not inserting sequentially are at least two fold:

  1. If you do not have memory fit, you may need to do random IO to load a page from disk to insert the value to that page.

  2. There might not be space remaining in the page (InnoDB fills 93% and leaves a small gap for updates), which could result in the page needing to be split. More split pages = fragmentation / less optimal use of things such as memory.

So, I think as long as you are approximately sequential at least (1) isn't a concern for the primary key index (could still be true for any unique indexes). You just need to be worried about (2).


Why I said that understanding the problem is important, is that there is so many ways to do this besides long GUIDs. For one, a BIGINT in MySQL is smaller than any data type you will probably be using, but has a range of 18 quintillion. You could allocate "chunks" of key space N thousand at a time to worker nodes and guarantee no duplicates. If a worker node crashes and doesn't use all the chunk it was allocated, so what. It doesn't matter.


Check out this question. It perhaps doesn't detail the specific uses of MySQL indices, but it does give you some performance data, and the code to generate the Seq. UIDs.

It seems MySQL indexing benefits greatly from sequential IDs, and according to MySQL the indexing relies on disk-ordering (see Section: B-Tree Index Characteristics) to find the relevant results.

From memory, MySQL indexing (for String indices at least) relies first on the alphanumeric-ordering of the field, i.e. "Oh, it begins with an A? I have data that begins with an A, I'll fetch it for you... etc." Rather than doing a full-text scan on each field.

And entering the UIDs in sequentially means the index does not reorder the results 'alphabetically' first, or at least reduces this time dramatically, hence the above performance benefits mentioned above.

(Not really a solution, but an answer at least.)


What I do is I use a fixed width character field and perpend a random UUID string to the current time (in milliseconds). This is nice because even if your server is accessed twice in the same millisecond it will still (likely) be unique. I assume if you have a massive server load this could give multiple id's but if this is worried about you could check to see if a row with this uuid has already been created.

PHP:

$date = new DateTime();
$UUID = uniqid( $date->format('Uu'), FALSE);  // For less length
$UUID = uniqid( $date->format('Uu'), TRUE);   // For more length

This is what I use on my (rarely used) server. But it should hold strong for bigger loads. As I said to overcome the slight chance that two identical keys are created check to see if it has already been used and assign a new one. (this shouldn't happen too often)

0

精彩评论

暂无评论...
验证码 换一张
取 消