开发者

Break a text file into chunks based on line like the string split operation?

开发者 https://www.devze.com 2023-01-14 04:19 出处:网络
I h开发者_高级运维ave text report files I need to \"split()\" like strings are split up into arrays.

I h开发者_高级运维ave text report files I need to "split()" like strings are split up into arrays.

So the file is like:

BOBO:12341234123412341234
1234123412341234123412341
123412341234
BOBO:12349087609812340-98
43690871234509875
45

BOBO:32498714235908713248
0987235

And I want to create 3 sub-files out of that splitting on lines that begin with "^BOBO:". I don't really want 3 physical files, I'd prefer 3 different file pointers.


Perhaps use itertools.groupby:

import itertools

def bobo(x):    
    if x.startswith('BOBO:'):
        bobo.count+=1
    return bobo.count
bobo.count=0

with open('a') as f:
    for key,grp in itertools.groupby(f,bobo):
        print(key,list(grp))

yields:

(1, ['BOBO:12341234123412341234\n', '1234123412341234123412341\n', '123412341234\n'])
(2, ['BOBO:12349087609812340-98\n', '43690871234509875\n', '45\n', '\n'])
(3, ['BOBO:32498714235908713248\n', '0987235\n'])

Since you say you don't want physical files, the whole file must be able to fit in memory. In that case, to create file-like objects, use the cStringIO module:

import cStringIO
with open('a') as f:
    file_handles=[]
    for key,grp in itertools.groupby(f,bobo):
        file_handles.append(cStringIO.StringIO(''.join(grp)))

file_handles will be a list of file-like objects, one for each "BOBO:" stanza.


If you can deal with keeping them in memory to work with them something like this probably works:

subFileBlocks = []

with open('myReportFile.txt') as fh:
  for line in fh:
    if line.startswith('BOBO'):
      subFileBlocks.append(line)
    else:
      subFileBlocks[-1] += line

At the end of that subFileBlocks should contain your sections as strings.

0

精彩评论

暂无评论...
验证码 换一张
取 消