开发者

Representing a very large array of bits in little memory

开发者 https://www.devze.com 2023-02-09 23:43 出处:网络
I would like to represent a structure containing 250 M s开发者_开发技巧tates(1 bit each) somehow into as less memory as possible (100 k maximum). The operations on it are set/get. I cold not say that

I would like to represent a structure containing 250 M s开发者_开发技巧tates(1 bit each) somehow into as less memory as possible (100 k maximum). The operations on it are set/get. I cold not say that it's dense or sparse, it may vary. The language I want to use is C.

I looked at other threads here to find something suitable also. A probabilistic structure like Bloom filter for example would not fit because of the possible false answers.

Any suggestions please?


If you know your data might be sparse, then you could use run-length encoding. But otherwise, there's no way you can compress it.


The size of the structure depends on the entropy of the information. You cannot squeeze information something in less than a given size if you have no repeated pattern. The worst case would still be about 32Mb of storage in your case. If you know something about the relation between the bits then it's maybe possible...


I don't think it's possible to do what you're asking. If you need to cover 250 million states of 1 bit each, you'd need 250Mbits/8 = 31.25MBytes. A far cry from 100KBytes.

You'd typically create a large array of bytes, and use functions to determine the byte (index >> 3) and bit position (index & 0x07) to set/clear/get.


250M bits will take 31.25 megabytes to store (assuming 8 bits/byte, of course), much much more than your 100k goal.

The only way to beat that is to start taking advantage of some sparseness or pattern in your data.


The max number of bits you can store in 100K of mem is 819,200 bits. This is assuming that 1 K = 1024 bytes, and 1 byte = 8 bits.


are files possible in your environment ? if so, you might swap, say for example 4k sized segmented bit buffer. your solution shoud access those bits in a serialized way to minimize disk load/save operation.

0

精彩评论

暂无评论...
验证码 换一张
取 消