I have m arrays, every array is of length n. Each array is sorted. I want to create a single array of length m*n, containing all the values of the previous arrays (including repeating values), sorted. I have to merge these arrays..
I think the optimum time complexity is m*n*log(m)
Here's the sketch of the algorithm..
I create a support array H of lenth m, containing all the values of the first element of each array.
I then sort this array (m log m), and move the min value to the output array.
I then replace the moved value with the next on开发者_JAVA技巧e, from the array it was taken. Actually I don't replace it, but I insert it in the right (sorted) position. This take log m I think.
And I repeat this for all m*n values... therefore m*n*log m
My question.. can you think of a more efficient algorithm? If mnlogm is actually optimum, can you at least think of a simpler, more elegant algorith?
The complexity is right! However, there's a small flaw in your algorithm idea: You cannot insert an item in a sorted array in log m
. You can find its position using binary search in that complexity, but you might have to move elements around to actually place it there. To fix this problem, you can use a heap data-structure instead!
Multi-way merge (which is the common name of your algorithm) is usually implemented with yet another 'merging' data-structure: the tournament-tree. You can find a description in Knuth's "The Art of Computer Programming" (Chapter on Sorting, iirc). It has a lower constant factor in theory and in practice when compared to heaps in this specific case.
If you want to look implementations, I'm pretty sure that the parallel multi-way merge in the GNU C++ Standard library parallel-extensions is implemented this way.
Edit: I referenced the wrong book, which is fixed now.
Best you can do is O(m*n + d). Similar to counting sort: http://en.wikipedia.org/wiki/Counting_sort If you know the range of values possible (d, say) you can initialize an array of length d, and then scan through each of the m arrays adding 1 to each 'bin' in d for each value corresponding to that bin. Then in your new array of length m*n for each value in d you add however many counts that bin has.
精彩评论