开发者

What is the best way in python to get a denormalized array from this ordered array?

开发者 https://www.devze.com 2023-02-06 09:08 出处:网络
I have this array: >>> print raw_data [\'LEVEL 1\', \'SUBJECT A\', \'GROUP X\', \'COMMENT i\', \'COMMENT ii\',

I have this array:

>>> print raw_data
['LEVEL 1',
'SUBJECT A',
'GROUP X',
'COMMENT i',
'COMMENT ii',
'COMMENT iii',
'GROUP Y',
'COMMENT iv',
'COMMENT v',
'COMMENT vi',
'LEVEL 2',
'SUBJECT B',
'GROUP Z',
'COMMENT vii',
'COMMENT viii',
'COMMENT ix',
'SUBJECT C',
'GROUP X2',
'COMMENT x',
'COMMENT xi',
'COMMENT xii',
'COMMENT xiii',
'GROUP Y2',
'COMMENT xiv',
'COMMENT xv',
'COMMENT xvi']

Where the obvious hierarchy is:

  1. Level
    1. Subject
      1. Grou开发者_Go百科p
        1. Comments

My objective is to get the array as a denormalized array to be store on a database:

>>> print result
[
    ['LEVEL 1', 'SUBJECT A', 'GROUP X', 'COMMENT i'],
    ['LEVEL 1', 'SUBJECT A', 'GROUP X', 'COMMENT ii'],
    ['LEVEL 1', 'SUBJECT A', 'GROUP X', 'COMMENT iii'],
    ['LEVEL 1', 'SUBJECT A', 'GROUP Y', 'COMMENT iv'],
    ['LEVEL 1', 'SUBJECT A', 'GROUP Y', 'COMMENT v'],
    ['LEVEL 1', 'SUBJECT A', 'GROUP Y', 'COMMENT vi'],
    ['LEVEL 2', 'SUBJECT B', 'GROUP Z', 'COMMENT vi'],
    ['LEVEL 2', 'SUBJECT B', 'GROUP Z', 'COMMENT vii'],
    ['LEVEL 2', 'SUBJECT B', 'GROUP Z', 'COMMENT viii'],
    ['LEVEL 2', 'SUBJECT B', 'GROUP Z', 'COMMENT ix'],
    ['LEVEL 2', 'SUBJECT C', 'GROUP X1', 'COMMENT x'],
    ['LEVEL 2', 'SUBJECT C', 'GROUP X1', 'COMMENT xi'],
    ['LEVEL 2', 'SUBJECT C', 'GROUP X1', 'COMMENT xii'],
    ['LEVEL 2', 'SUBJECT C', 'GROUP X1', 'COMMENT xiii],'
    ['LEVEL 2', 'SUBJECT C', 'GROUP Y2', 'COMMENT xiv'],
    ['LEVEL 2', 'SUBJECT C', 'GROUP Y2', 'COMMENT xv'],
    ['LEVEL 2', 'SUBJECT C', 'GROUP Y2', 'COMMENT xi']
]

I was trying to solve this, but I am quite lost, I think this problem has to be usual, so I would like to know if someone has a efficient approach, this seems to be something like nested sets, but I don't know a lot of this on python, getting the level is easy, but I am getting " headaches" getting this further.

>>> def addlevel(a):
    if a.startswith('LEVEL'):
        return [1, a]
    elif a.startswith('SUBJECT'):
        return [2, a]
    elif a.startswith('GROUP'):
        return [3, a]
    elif a.startswith('COMMENT'):
        return [4, a]
>>> map(addlevel, raw_data)
[[1, 'LEVEL 1'],
 [2, 'SUBJECT A'],
 [3, 'GROUP X'],
 [4, 'COMMENT i'],
 [4, 'COMMENT ii'],
 [4, 'COMMENT iii'],
 [3, 'GROUP Y'],
 [4, 'COMMENT iv'],
 [4, 'COMMENT v'],
 [4, 'COMMENT vi'],
 [1, 'LEVEL 2'],
 [2, 'SUBJECT B'],
 [3, 'GROUP Z'],
 [4, 'COMMENT vii'],
 [4, 'COMMENT viii'],
 [4, 'COMMENT ix'],
 [2, 'SUBJECT C'],
 [3, 'GROUP X2'],
 [4, 'COMMENT x'],
 [4, 'COMMENT xi'],
 [4, 'COMMENT xii'],
 [4, 'COMMENT xiii'],
 [3, 'GROUP Y2'],
 [4, 'COMMENT xiv'],
 [4, 'COMMENT xv'],
 [4, 'COMMENT xvi']]

I would appreciate any clues !


Pseudocode, don't have a handy python interpreter right now:

Set LEVEL, SUBJECT, GROUP to None, results to []

Loop over the list
  if its a 'LEVEL', set LEVEL to it
  if its a 'SUBJECT', set SUBJECT to it
  if its a 'GROUP', set GROUP to it
  if its a "COMMENT", append [LEVEL SUBJECT GROUP and COMMENT] to results
Ta-da.

It just relies on the ordering...


You could try something like this:

raw_data = [ 'LEVEL 1', 'SUBJECT A', 'GROUP X', 'COMMENT i', 'COMMENT ii',
'COMMENT iii', 'GROUP Y', 'COMMENT iv', 'COMMENT v', 'COMMENT vi', 'LEVEL 2',
'SUBJECT B', 'GROUP Z', 'COMMENT vii', 'COMMENT viii', 'COMMENT ix', 
'SUBJECT C', 'GROUP X2', 'COMMENT x', 'COMMENT xi', 'COMMENT xii', 
'COMMENT xiii', 'GROUP Y2', 'COMMENT xiv', 'COMMENT xv', 'COMMENT xvi' ]

level, subject, group, comment = '', '', '', ''

result = []

for item in raw_data:

    if item.startswith('COMMENT'): 
        comment = item
    elif item.startswith('GROUP'): 
        group = item
        comment = ''
    elif item.startswith('SUBJECT'): 
        subject = item
        group = ''
    elif item.startswith('LEVEL'): 
        level = item
        subject = ''

    if level and subject and group and comment:
        result.append([level, subject, group, comment])

import pprint
pprint.pprint(result)

Which would yield:

[['LEVEL 1', 'SUBJECT A', 'GROUP X', 'COMMENT i'],
 ['LEVEL 1', 'SUBJECT A', 'GROUP X', 'COMMENT ii'],
 ['LEVEL 1', 'SUBJECT A', 'GROUP X', 'COMMENT iii'],
 ['LEVEL 1', 'SUBJECT A', 'GROUP Y', 'COMMENT iv'],
 ['LEVEL 1', 'SUBJECT A', 'GROUP Y', 'COMMENT v'],
 ['LEVEL 1', 'SUBJECT A', 'GROUP Y', 'COMMENT vi'],
 ['LEVEL 2', 'SUBJECT B', 'GROUP Z', 'COMMENT vii'],
 ['LEVEL 2', 'SUBJECT B', 'GROUP Z', 'COMMENT viii'],
 ['LEVEL 2', 'SUBJECT B', 'GROUP Z', 'COMMENT ix'],
 ['LEVEL 2', 'SUBJECT C', 'GROUP X2', 'COMMENT x'],
 ['LEVEL 2', 'SUBJECT C', 'GROUP X2', 'COMMENT xi'],
 ['LEVEL 2', 'SUBJECT C', 'GROUP X2', 'COMMENT xii'],
 ['LEVEL 2', 'SUBJECT C', 'GROUP X2', 'COMMENT xiii'],
 ['LEVEL 2', 'SUBJECT C', 'GROUP Y2', 'COMMENT xiv'],
 ['LEVEL 2', 'SUBJECT C', 'GROUP Y2', 'COMMENT xv'],
 ['LEVEL 2', 'SUBJECT C', 'GROUP Y2', 'COMMENT xvi']]
0

精彩评论

暂无评论...
验证码 换一张
取 消