开发者

parsing a file into a list of lists

开发者 https://www.devze.com 2023-04-03 19:27 出处:网络
I have text file in this manner{ a 3 56 cd 8 } { 1 2 3 4 ab 546 } I am currently using the following line to parse it into a list of list

I have text file in this manner

{ a 3 56 cd 8 }
{ 1 2 3 4 ab 546 }

I am currently using the following line to parse it into a list of list

for line in filename.readlines():
    line = line.lstrip('{').rstrip('}\n').strip(' ').split(' ')

Is this the best way to do this? Because I have heard people say that the split function should be seldom used as it slows开发者_开发技巧 down the script considerably.

EDIT: I expect the output to be:

[[a,3,56,'cd',8],[1,2,3,4,'ab',546]]


Assuming there is no whitespace before the openening and after the closing bracket:

li = [line[1:-1].split() for line in file]

or if I can't assume that:

li = [line.strip()[1:-1].split() for line in file]


It may be better to use a module like the csv module to parse your file. Here is a sample code.

# Your file contents - test.csv
{ 1 2 3 asd 4 5 6 }
{ 5 6 7 8 def 8 9 }

>>> import csv
>>> reader = csv.reader(open('test.csv', 'rb'), delimiter=' ')
>>> all_lines = []
>>> for line in reader:
>>>     # if the braces are always in the first and last positions
>>>     # you can just do this
>>>     all_lines.append(line[1:-1])
>>> 
>>> all_lines
[['1', '2', '3', 'asd', '4', '5', '6'], ['5', '6', '7', '8', 'def', '8', '9']]

Note that the list will contain the numbers as strings. You can convert them to numerical format before appending if you want to.


Using a list comprehension:

[ [ c for c in l.split() if c not in ('{', '}') ] for l in filename.readlines() ]

If you wish to avoid split you could use regex, don't know how this would perform better:

import re
[ re.findall("\w+", l) for l in filename.readlines() ]


I would use one strip procedure:

L = []
for line in file:
    values = line.strip('{}\n\r ').split(' ')
    L.append(values)

It assumes your values don't have '{}'. It also would work on Windows (since the linebreak on Windows has \r apart from \n).

If several split functions are used, there are a lot of temporary objects created in the memory on each step (since string is immutable).

I doubt, if there is any faster solution other than using split.

Also, there is no need to clutter memory with the file with filename.readlines(). It can be perfectly read line by line using for line in file, also it's not OK to name file_object as 'file_name', since they are not exactly the same.

There are some solutions with slicing (string[1:-1]). Some testing is required to determine if this approach is faster than with only strip used.

0

精彩评论

暂无评论...
验证码 换一张
取 消