开发者

HashSet of ByteBuffer(actually integers) to separate unique & non unique elements from a ByteBuffer array

开发者 https://www.devze.com 2023-02-18 04:42 出处:网络
I have an array of ByteBuffers(which actually represent integers). I want to the separate unique & non unique ByteBuffers (i.e integers) in the array. Thus I am using HashSet of this type:

I have an array of ByteBuffers(which actually represent integers). I want to the separate unique & non unique ByteBuffers (i.e integers) in the array. Thus I am using HashSet of this type:

HashSet<ByteBuffer> columnsSet = new HashSet<ByteBuffer>()

Just wanted to know if HashSe开发者_StackOverflow社区t is a good way to do so? And do I pay more costs, when doing so for a ByteBuffer then if I would have done it for a Integer?

(Actually I am reading serialized data from DB which needs to be written back after this operation thus I want to avoid serialization and deserialization between bytebuffer to Integer and back!)

Your thoughts upon this appreciated.


Creating a ByteBuffer is far more expensive than reading/writing from a reused ByteBuffer.

The most efficient way to store integers is to use int type. If you want a Set of these you can use TIntHashSet which uses int primitives. You can do multiple read/deserialize/store and reverse with O(1) pre-allocated objects.


First of all, it will work. The overhead of equals() on two ByteBuffers will definitely be higher, but perhaps not enough to offset the benefits of not having to deserialize (though, I'm not entirely sure if that would be such a big problem).

I'm pretty sure that the performance will asymptotically be the same, but a more memory-efficient solution is to sort your array, then step through it linearly and test successive elements for equality.

An example, suppose your buffers contain the following:

1 2 5 1

Sort it:

1 1 2 5

Once you start iterating, you get ar[0].equals(ar[1]) and you know these are duplicates. Just keep going like that till n-1.


Collections normally operate on the equals() and hashCode() methods, so performance implications would come through the implementation of the objects stored in the collection.

Looking at ByteBuffer and Integer one can see that the implementation of those methods in Integer are simpler (just one int comparison for equals() and return value; for hashCode()). Thus you could say the Set<ByteBuffer> has higher cost than a Set<Integer>.

However, I can't tell you right now if this cost is higher than the serialization and deserialization cost.

In fact, I'd just go for the more readable code unless you really have a performance problem. In that case I'd just try both methods and take the faster one.

0

精彩评论

暂无评论...
验证码 换一张
取 消