开发者

merging n sorted lists of tuples in python

开发者 https://www.devze.com 2023-02-11 14:49 出处:网络
I have n lists (n<10) of tuples in the format [(ListID, [(index,value),(index, value),...)] and want to sort them by index to get to following outcome

I have n lists (n<10) of tuples in the format [(ListID, [(index,value),(index, value),...)] and want to sort them by index to get to following outcome

Example Input:
[('A',[(0.12, 'how'),(0.26,'are'),(0.7, 'you'),(0.9,'mike'),(1.9, "I'm fine too")]),
('B',[(1.23, 'fine'),(1.50, 'thanks'),(1.6,'and you')]),
('C',[(2.12,'good'),(2.24,'morning'),(3.13,'guys')])]

Desired Output:
[('A', ( 0.12, 'how')),
('A', ( 0.26, 'are')),
('A', ( 0.7, 'you')),
('A', ( 0.9, 'mike')),
('B',(1.23, 'fine')),
('B',(1.50, 'thanks')),
('B',(1.6,'and you')),
('A', (1.9, "I'm fine too")),
('C',(2.12,'good')),
('C',(2.24,'morning')),
('C',(3.13,'guys'))]   

I know the code is ugly, especially those indices item[0][-1][1], but can somebody tell me what am I doing wrong?

content = []    
max = 0.0
first = True 
Done = False
finished = []
while not Done:
    for item in flow:
        if len(finished) == 4:
            Done = True
            break
        if开发者_StackOverflow社区 len(item[1]) == 0:
            if item[0] not in finished:
                finished.append(item[0])
            continue
        if first == True:
            max = item[1][-1][0]
            content.append((item[0], item[1].pop()))
            first = False 
            continue
        if item[1][-1][0] > max:
            max = item[1][-1][0]
            content.append((item[0], item[1].pop()))
            content = sorted(content, key=itemgetter(1))    

    first = True    

UPDATE: thank you everybody


>>> from operator import itemgetter
>>> import pprint
>>> pprint.pprint(sorted(((i,k) for i,j in INPUT for k in j), key=itemgetter(1)))
[('A', (0.12, 'how')),
 ('A', (0.26000000000000001, 'are')),
 ('A', (0.69999999999999996, 'you')),
 ('A', (0.90000000000000002, 'mike')),
 ('B', (1.23, 'fine')),
 ('B', (1.5, 'thanks')),
 ('B', (1.6000000000000001, 'and you')),
 ('A', (1.8999999999999999, "I'm fine")),
 ('C', (2.1200000000000001, 'good')),
 ('C', (2.2400000000000002, 'morning')),
 ('C', (3.1299999999999999, 'guys'))]

There are two main things going on here

[(i,k) for i,j in INPUT for k in j]

takes transforms the INPUT to this struture

[('A', (0.12, 'how')),
 ('A', (0.26, 'are')),
 ('A', (0.7, 'you')),
 ('A', (0.9, 'mike')),
 ('A', (1.9, "I'm fine")),
 ('B', (1.23, 'fine')),
 ('B', (1.5, 'thanks')),
 ('B', (1.6, 'and you')),
 ('C', (2.12, 'good')),
 ('C', (2.24, 'morning')),
 ('C', (3.13, 'guys'))]

and

sorted(L, key=itemgetter(1))

sorts L buy item[1] of each element. This is actually (0.12, 'how'), (0.27, 'are') ... but the normal way python sorts tuples is from left to right, so we don't need to do extra work to strip the word from the tuple


(OK, the sample data makes the problem description much clearer. Answer revised accordingly)

Step 1: clarify your problem description by reverse engineering your current solution.

  1. There are 4 different data sets labelled A, B, C and D
  2. These data sets are contained in a series of 2-tuples of the form (ListID, elements)
  3. Each "elements" entry is itself a list of 2-tuples of the form (index, value)
  4. An empty elements entry indicates the end of a data set
  5. The goal is to merge these data sets into a single sorted list of 2-tuples (ListID, (index, value))

Step 2: transform the input data to create individual records of the desired form.

Generators are built for this kind of thing, so it makes sense to define one.

def get_data(flow, num_data_sets=4):
    finished = set()
    for list_id, elements in flow:
        if list_id in finished:
            continue
        if not elements:
            finished.add(list_id)
            if len(finished) == num_data_sets:
                break
            continue
        for element in elements:
            yield list_id, element

Step 3: use sorted to produce the desired ordered list

content = sorted(get_data(flow))

Sample usage:

# get_data defined via copy/paste of source code above
# ref_data taken from the revised question
>>> demo_data = [
...   ('A', [(1, 2), (3, 4)]),
...   ('B', [(7, 8), (9, 10)]),
...   ('A', [(0, 0)]),
...   ('C', []), # Finish early
...   ('C', [('ignored', 'entry')])
... ]
>>> content = sorted(get_data(demo_data))
>>> print '\n'.join(map(str, content))
('A', 0, 0)
('A', 1, 2)
('A', 3, 4)
('B', 7, 8)
('B', 9, 10)
>>> content = sorted(get_data(ref_data), key=itemgetter(1))
>>> print '\n'.join(map(str, content))
('A', 0.12, 'how')
('A', 0.26, 'are')
('A', 0.7, 'you')
('A', 0.9, 'mike')
('B', 1.23, 'fine')
('B', 1.5, 'thanks')
('B', 1.6, 'and you')
('A', 1.9, "I'm fine too")
('C', 2.12, 'good')
('C', 2.24, 'morning')
('C', 3.13, 'guys')

Your solution ends up being messy and hard to read for two main reasons:

  1. Failing to use a generator means you aren't gaining the full benefit of the builtin sorted function
  2. By using indexing instead of tuple unpacking you make it very hard to keep track of what is what


data = [(x,id) for (id, xs) in data for x in xs]
data.sort()
for xs,id in data:
    print id,xs


A (0.12, 'how')
A (0.26000000000000001, 'are')
A (0.69999999999999996, 'you')
A (0.90000000000000002, 'mike')
B (1.23, 'fine')
B (1.5, 'thanks')
B (1.6000000000000001, 'and you')
A (1.8999999999999999, "I'm fine too")
C (2.1200000000000001, 'good')
C (2.2400000000000002, 'morning')
C (3.1299999999999999, 'guys')


Your input:

l = [('A',
    [(0.12, 'how'),
    (0.26000000000000001, 'are'),
    (0.69999999999999996, 'you'),
    (0.90000000000000002, 'mike'),
    (1.8999999999999999, "I'm fine too")]),
    ('B', [(1.23, 'fine'), (1.5, 'thanks'), (1.6000000000000001, 'and you')]),
    ('C',
    [(2.1200000000000001, 'good'),
    (2.2400000000000002, 'morning'),
    (3.1299999999999999, 'guys')])]

Convert (and print):

newlist = []
for alpha, tuplelist in l:
    for tup in tuplelist:
        newlist.append((alpha,tup))

from operator import itemgetter
sorted(newlist,key=itemgetter(1))
print newlist

Check!

[('A', (0.12, 'how')),
 ('A', (0.26000000000000001, 'are')),
 ('A', (0.69999999999999996, 'you')),
 ('A', (0.90000000000000002, 'mike')),
 ('B', (1.23, 'fine')),
 ('B', (1.5, 'thanks')),
 ('B', (1.6000000000000001, 'and you')),
 ('A', (1.8999999999999999, "I'm fine too")),
 ('C', (2.1200000000000001, 'good')),
 ('C', (2.2400000000000002, 'morning')),
 ('C', (3.1299999999999999, 'guys'))]

You can of course do it within the list comprehension, but you still use 2 for loops and 1 inbuilt sorted function. Might as well make it verbose and readable then.

0

精彩评论

暂无评论...
验证码 换一张
取 消