开发者

Need to store 128 *bit* Primary Key: Should I use SQL Azure or Azure Table? Or Just use a linked list in Azure Blob

开发者 https://www.devze.com 2023-01-10 19:58 出处:网络
I need to store a large (128-bit) PK.Each int will have some corresponding columns... no schema is defined now... and I want the schema flexible in the future. (I only need conservative flexibility eg

I need to store a large (128-bit) PK. Each int will have some corresponding columns... no schema is defined now... and I want the schema flexible in the future. (I only need conservative flexibility eg adding new columns from time to time)

At this point I'm not too concerned with the ability to do joins and such. I mostly want to pick a random PK and search up or down to the next 10 records. Since there can be a lot of white space in the search the cost of the upward and downward search may vary.

What is the best technology to handle this request? I'm interested in something that will save me money (per transaction), and storage space. I'm also interested in performance.

What do you recommend?

Update

OK, so what is this for? I want to create a history of data for IPv6 addresses. Of course thi开发者_如何学Gos will be a very sparse table... but I do need to track certain things regarding seen IP's.


To clarify, I think you need a key of 128 bits (not 2^128 bits).

I'm taking this as a question about Db Key type selection, I'm not sure what consequences the Azure angle has. AFAIK it is build on top of MS-SQL.

128 bits or 16 bytes is the same size as a Guid (UniqueIdentifier) but I don't think you want to use that. Although there is support for it to be used as a key.

A direct choice would be something like binary(16) but I don't know how well suited that is as a PK.

You can code it as a char(32) hex string, that is not to excessive.

For practicality estimates, the key factor is how sparse your data is, or better: how many addresses do you expect to have to store?


Windows Azure Tables would be my recommendation, but there's only one sort order defined, so it will be hard to search both forward and backward. You may end up having to store each key twice, once in normal order, and once reversed (0xFFF...F - key) to support both scan directions efficiently.


First of all, your premise in 2^128 integer keys is wrong, since you mentioned you want to store IP V6 addresses. An IP V6 Address is 128 bits long. To store it as an integer you need 128/32 or 4 32-bit integers per address. So the correct estimate is 2^128 possible addresses * 4 integers for a total of 2^128 * 4 keys of 32 bit integers....

Anyway I want that in bytes so we'll just go 2^128 possible addresses * 4 integers * 4 bytes per integer = 5.44 * 10^39 bytes. After that just follow Andreas' calculation and you'll end up with more....

That being said the idea of IP V6 is that we have more addresses than we'll ever need to use. So I highly doubt anywhere near 2^128 will be assigned for many years. At most if we go to IP V6 right now, we'll have the IP V4 address space assigned and nothing else, and though the number of ip addresses increases every year, not by that much.

Anyway it seems like you don't know what you are storing since the schema is not defined so Azure table may be what you want. Mostly it is key/value. For each IP address you could store totally different properties. And it is really easy to add another property/remove another property using the update/insert/merge operations. But if you want some uniformity applied to your data than use SQL. It's true that you will have to modify the schema as changes happens, but this will enforce that every row (and hence IP address) has the same data. Otherwise it is easy to leave out "required" columns/properties or to misspell them if you have multiple applications. But it really depends what you want to do. It's more do you value data integrity or do you value the flexibility of properties? Even though a schema does need to be changed, there are commands to add/remove columns from a schema. It's more do you want every IP address to store the same properties or can each have different properties. I believe the Azure Table way probably takes less storage per address than the SQL way if you are not using most of the properties for a given IP address. So it all depends on what you are looking for.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号