I have 2 lists that have dates and data. Each list is in the proper order as noted by the sequence number. Now I need to merge the 2 lists together and keep everythi开发者_JAVA技巧ng in the correct order.
For example:
List A
20101001 A data 1 seq1 20101001 A data 2 seq2 20101005 A data 3 seq3List B
20101001 B data 1 seq1 20101003 B data 2 seq2etc...
I need the new list to look like this:
20101001 A data 1 seq1
20101001 A data 2 seq2 20101001 B data 1 seq3 20101003 B data 2 seq4 20101005 A data 3 seq5
2 things that I thought of is merging the lists together and applying the sequence number prior to inserting them into a db or I can insert them into the db with the current sequence and pull them back out again to merge them together, but that seems like an extra step and kludgy.
Any ideas on the best way to go about this?
Assuming your lists are in Ruby Arrays, and the objects in the lists have attributes defined (such as obj.sequence_number), one way to merge and sort the lists would be:
First merge the lists as a union:
@merged_list = @list_a | @list_b
Then sort the merged_list with the appropriate sorting rule:
@merged_list.sort! {|a, b| a.date <=> b.date # or whatever your sorting rule is... }
Edit:
Once the merged array is sorted, you can re-define the sequence_number:
@merged_list.each_with_index {|obj, index| obj.sequence_number = "seq#{index+1}"}
Edit:
Same thing applies if your objects in the lists are themselves just simple arrays:
@merged_list.sort! {|a, b| a[0] <=> b[0] # or whatever your sorting rule is... }
@merged_list.each_with_index {|obj, index| obj[2] = "seq#{index+1}"}
Try this:
(listA + listB).sort!{|a, b| a.sequence_no <=> b.sequence_no}
This is an algorithm for merging an arbitrary number of sorted lists in more or less linear time:
def merge_sorted(*lists)
# the lists will be modified, so make (shallow) copies
lists = lists.map(&:dup)
result = []
loop do
# ignore lists that have been exhausted
lists = lists.reject(&:empty?)
# we're done if all lists have been exhausted
break if lists.empty?
# find the list with the smallest first element
top = lists.inject do |candidate, other|
candidate.first < other.first ? candidate : other
end
result << top.shift
end
result
end
list1 = [1, 2, 5, 6, 9]
list2 = [2, 3, 4, 11, 13]
list3 = [1, 2, 2, 2, 3]
p merge_sorted(list1, list2, list3)
# => [1, 1, 2, 2, 2, 2, 2, 3, 3, 4, 5, 6, 9, 11, 13]
For each iteration it finds the list with the smallest first element, and shifts this element off of it onto the results list. It does this until all lists are empty.
I say more or less linear time since it's actually O(n × m) where n is the number of lists and m is the total number of elements in the lists but I think this can safely be simplified to O(m) for most cases since n will be small in comparison to m.
This uses with_index
which is a nice way to add an index value to an iterator:
result = (list_a + list_b).sort_by { |a| a[0 .. -2] }.map.with_index { |a, i| a[0 .. -2] + (1 + i).to_s }
puts result
# >> 20101001 A data 1 seq1
# >> 20101001 A data 2 seq2
# >> 20101001 B data 1 seq3
# >> 20101003 B data 2 seq4
# >> 20101005 A data 3 seq5
Here's some variations with benchmarks:
require 'benchmark'
list_a = [
'20101001 A data 1 seq1',
'20101001 A data 2 seq2',
'20101005 A data 3 seq3'
]
list_b = [
'20101001 B data 1 seq1',
'20101003 B data 2 seq2'
]
# #1
result = (list_a + list_b).sort_by { |a| a[0 .. -2] }.map.with_index { |a, i| a[0 .. -2] + (1 + i).to_s }
result # => ["20101001 A data 1 seq1", "20101001 A data 2 seq2", "20101001 B data 1 seq3", "20101003 B data 2 seq4", "20101005 A data 3 seq5"]
# #2
result = (list_a + list_b).map{ |r| r[0 .. -2] }.sort.map.with_index { |a, i| a + (1 + i).to_s }
result # => ["20101001 A data 1 seq1", "20101001 A data 2 seq2", "20101001 B data 1 seq3", "20101003 B data 2 seq4", "20101005 A data 3 seq5"]
# #3
i = 0
result = (list_a + list_b).map{ |r| r[0 .. -2] }.sort.map { |a| i += 1; a + i.to_s }
result # => ["20101001 A data 1 seq1", "20101001 A data 2 seq2", "20101001 B data 1 seq3", "20101003 B data 2 seq4", "20101005 A data 3 seq5"]
# #4
i = 0; result = (list_a + list_b).sort.map { |a| i += 1; a[-1] = i.to_s; a }
result # => ["20101001 A data 1 seq1", "20101001 A data 2 seq2", "20101001 B data 1 seq3", "20101003 B data 2 seq4", "20101005 A data 3 seq5"]
n = 75000
Benchmark.bm(7) do |x|
x.report('#1') { n.times { (list_a + list_b).sort_by { |a| a[0 .. -2] }.map.with_index { |a, i| a[0 .. -2] + (1 + i).to_s } } }
x.report('#2') { n.times { (list_a + list_b).map{ |r| r[0 .. -2] }.sort.map.with_index { |a, i| a + (1 + i).to_s } } }
x.report('#3') { n.times { i = 0; (list_a + list_b).map{ |r| r[0 .. -2] }.sort.map { |a| i += 1; a + i.to_s } } }
x.report('#4') { n.times { i = 0; (list_a + list_b).sort.map { |a| i += 1; a[-1] = i.to_s } } }
end
# >> user system total real
# >> #1 1.150000 0.000000 1.150000 ( 1.147090)
# >> #2 0.880000 0.000000 0.880000 ( 0.880038)
# >> #3 0.720000 0.000000 0.720000 ( 0.727135)
# >> #4 0.580000 0.000000 0.580000 ( 0.572688)
It's good to benchmark.
精彩评论