开发者

Data range filters in C++

开发者 https://www.devze.com 2023-01-09 09:41 出处:网络
I want to allow the user to be able to define ranges that will filter data. The ranges defined could be contiguous, over lapping, or separated (e.g. th开发者_如何学运维e user enters the following rang

I want to allow the user to be able to define ranges that will filter data. The ranges defined could be contiguous, over lapping, or separated (e.g. th开发者_如何学运维e user enters the following ranges: 1-10, 5-10, 10-12, 7-13, and 15-20).

I then want to filter the data so that the user is only displayed what is within those ranges.

I will probably create code on a different layer that will combine the ranges where appropriate (so the above example will become 1-13 and 15-20, but I don't want my data service to be concerned with that, so it must be able to handle the example above)

I have a lot of data and speed is a priority, so I don't want to have to iterate through a list of ranges for each data item to check if it should be displayed to the user or not.

Is there a data structure (or some algorithm) that could be used to achieve this?


You can use boost's filter_iterator to achieve this.


If you sort your list of ranges, you can then use binary search to minimize iteration. But really, unless you have an large number of ranges, iteration will be fastest.


You can use iterators into your containers. For example, std::vector provides the "at" method. These iterators can be contiguous, overlapping, or separated.


Make your list disjoint (as you suggested), combining ranges which overlap. Then sort the array of endpoints, and for each data element, do a binary search and determine whether it's within a range or outside it. Even elements will always begin a range, odd elements will always end a range.

HTH.


Solution in general depends on range bounds.

  1. If max-min is not so huge (for example, you define bounds in [1..1024]), you could use just an array, which point every X to the list of ranges. For your example, the array should be:
ranges=[0:(1,10), 1:(5,10), 2:(10,12), 3:(7,13), 4:(15-20)]
points=[1:[0],2:[0],3:[0],4:[0],5:[0,1],...,7:[0,1,3],...10:[0,1,2,3],...15:[4],...20:[4],21:[]...]

So, in this case you could quicly determine the ranges for particular X.

  1. You could use Interval tree - less efficient, but memory frendlier (of course more efficient than brute-force solution)


One approach would be to combine ranges you receive and map them to an underlying bitmap indicating in or not in the range.

A class-based design would allow you to overload operator += for syntactic sugar, but a bare bitmap would work just as well. For instance:

# original bitmap
bits = [ 0,0,0,0,0,0,0,0,0,0 ]

# add 1-5
bits = [ 0,1,1,1,1,1,0,0,0,0 ]

# add 4 - 6
bits = [ 0,1,1,1,1,1,1,0,0,0 ]

# Look for 3
bits[3] == 1 ?


I think what you want to do is known as Range Minimum Query.


It's not too hard if your data is allready sorted. Use a combination of

  • lower bound
  • upper bound

For each of your subranges [min, max] you can find the iterators i_min and i_max and use them as

std::make_pair(i_min, i_max)

to make it "range" compatible. Then use boost::join to concatenate all the sub ranges into a single range ( lazily of course ) and then use this range in down stream processing.

Obviously you should pre-process all your ranges to make sure they are non overlapping.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号