开发者

Improving stepping through an array twice (Nested loop on same array)

开发者 https://www.devze.com 2023-04-01 17:24 出处:网络
I have a large set of data that I want to cycle through in order to determine various statistics on the data set from a point in time \'D1\' to a point in time in the future \'D2\'. Basically, I want

I have a large set of data that I want to cycle through in order to determine various statistics on the data set from a point in time 'D1' to a point in time in the future 'D2'. Basically, I want to add to a database each time the difference between the values is larger than 10. For example:

Datum[] data = x;
for( Datum d1 : data ){
    Datum[] tail = y; //From d1 up to 10 elements ahead
    for( Datum d2 : tail ){
        //Calculate difference
        if( (d2.val - d1.val) > 10 ){
            //Insert into database
        }
    }
}

My question is, is there a better algorithm/method to doing this? Since 9 elements from开发者_JS百科 tail are reused in the next iteration of the outer loop, can I benefit somehow from that? My goal was to get this down to much less than (Big-O Notation) O(n2), but I can't wrap my head around it. Typically finding a D1, D2 pair that satisfies the criteria means that the next D1 element will have a greater chance of matching as well. Can I use that to my advantage?

I'm trying to get this to be as efficient as possible because the data set is just so large.


An index-based for loop might perform much better than an iterator, since you can index the original array directly and avoid copying to a new array. You'd have much better memory locality, less chance of false sharing, etc.


what you have is a classic sweepline algorithm which are O(k*n) with k the "overlap" or the portion that the inner loop runs over. in your case it's maximum 10 no matter what n is

Datum[] data = x;
for(int i=0;i<data.length;i++ ){
    Datum d1=data[i];
    Datum[] tail = y; //From d1 up to 10 elements ahead
    for(int j=i+1;j<Math.min(i+10,data.length);i++){
        d2 = data[j];
        //Calculate difference
        if( (d2.val - d1.val) > 10 ){
            //Insert into database

            break;//inner loop
        }
    }
}


In your shoes, the first thing I would do is profile a typical dataset and find out where the time is going. This should give some hints as to where to focus your optimization efforts.

Assuming the calculation is as simple as the subtraction/comparison, and the arrays are quickly accessed, then your suggestion of looking to optimize saving to the database should be the next priority. For example, writing out a text file and using a bulk insert can give very fast performance compared to individual insert statements. If you stick to using separate inserts, and are using JDBC, then batch updates will be a great help, since they avoid the latency in communicating with the database.

If that is still not fast enough, consider partitioning the array into N partitions, and have each partition processed by a separate thread. This will be particularly effective if processing is CPU bound.

Finally, look for code-level optimizations, such as avoiding iterators by using an index. If the number of items written to the database is small compared to the number of elements iterated, then the iterator creation may be a bottleneck.

If the number of elements is larger than 10, and critically, more than can fit in the cpu cache, it will be more efficient to break up scanning into smaller blocks. For example, rather than scanning 1000 elements from data2, break it up into (say) 10 scans of 100, with each of the 10 scans using a different value of d1. This is similar to how matrix multliplication is implemented in a block fashion and makes better use of the cpu caches.

Although you are using two loops, which typically is a O(N^2) algorithm, the second loop has a fixed size - 10 elements, so this reduces to a simple constant factor - you are doing roughly a factor of 10 more work.


There is an asymptotically faster way to solve this problem, but I have serious doubts as to whether it would run faster in practice because your window size (10) is so small. If you want to increase this size - which I'll call k - to be larger, then you might want to consider opting for an approach like the following.

When you're using this algorithm, you need to maintain a window of the k elements that supports two operations:

  1. Insert a new element, evicting the oldest.
  2. Returns all elements greater than some value.

One way to do this would be to store all of your elements in a data structure combining a balanced binary search tree and a queue. The queue contains all k elements stored in the order in which they appear in the original sequence, and is used so that we can remember which element to evict when we need to add a new element. The balanced BST stores a copy of each of those elements stored in sorted order. This means that you can implement the above operations like this:

  1. Insert a new element, evicting the oldest: Add the new element to the queue and BST. Then, dequeue from the queue to get the oldest element, then remove it from the BST. Runtime: O(log k), since the BST has k elements in it.
  2. Return all elements greater than some value: Using the BST, find the smallest element at least as large as that value in O(log n) time. Then, scan across the BST and list all elements at least as large as that element. This takes O(z) time, where z is the total number of matches found.

Collectively, if you have n elements and a total of z pairs that you need to insert into the database, this algorithm will take O(n log k + z) time. To see this, note that we do a total of n copies of operation (1), which takes O(log k) time each. We also do n copies of operation (2), which takes O(n log k) time to find successors and then O(z) total time across all iterations to list all the matching pairs.

The asymptotic runtime of this algorithm is good compared to the O(nk) algorithm you've originally posted. Assuming that the number of matches isn't "really huge" (say, on the order of nk), this will be much faster as you increase n and k.

If the values you're storing are integers in a small range (say, 0 - 10,000), you can speed this up even further by replacing the balanced BST with a data structure optimized for integers, like a van Emde Boas tree, which reduces this to O(n log log k + z). Again, this is only faster asymptotically, and if you are holding k constant at 10 this is almost certainly not worth it.

Hope this helps!

0

精彩评论

暂无评论...
验证码 换一张
取 消