开发者

How would you serialize a large array with 10^9 values?

开发者 https://www.devze.com 2023-01-17 05:59 出处:网络
Th开发者_开发百科is throws an out of memory exception when using 10^8 items but not with 10^7.How would you serialize an array with 10^9 values in it so that it could be stored in a database?

Th开发者_开发百科is throws an out of memory exception when using 10^8 items but not with 10^7. How would you serialize an array with 10^9 values in it so that it could be stored in a database?

Dim List((10 ^ 9) - 1) As Int64
For i = 1 To (10 ^ 9)
    List(i - 1) = i
Next
Dim Format As New Runtime.Serialization.Formatters.Binary.BinaryFormatter
Dim Writer As New System.IO.MemoryStream
Format.Serialize(Writer, List)

[EDIT]

This is on a 64 bit machine with more memory than one could every ask for. 8GB and can page up to 15GB


An Int64 is 8 bytes; 1e9 of them is 8GB. In order to serialize the array you must have the 8GB array in memory plus 8GB for the MemoryStream, thus clearly requiring 16GB of memory. It's not clear how you are going to store 8GB to your database, but to complete the immediate task you just need to get more memory, make the numbers smaller (i.e. Int32), or stream to disk instead of memory.

Exactly how do you intend to store 8GB of data in your DB? Most that I know of only allow a single value to be at most 2GB or 4GB.


Use System.IO.BinaryWriter instead to do your own serialization - just call Write(int) on it. However, a MemoryStream will not support more than 2^31 values, so you'll need to write it to some other kind of stream. UnmanagedMemoryStream is a possibility or your database client may provide something specifically for storing large binary values. (I don't know what kind of database you're writing to.)


Your example simply stores the index + 1 in each array element; you can get this via a calculation at runtime, there is no need to store or serialize anything.

Even if your example is made-up and you are actually trying to store 10^9 arbitrary integers, you are probably going to going to have many duplicates. In this case you should be using a sparse data structure, not an array.

Addendum: If the values are primary keys and must be unique, you may be better off storing the numbers that are not used rather than the ones which are.

0

精彩评论

暂无评论...
验证码 换一张
取 消