I have text file in this manner
{ a 3 56 cd 8 }
{ 1 2 3 4 ab 546 }
I am currently using the following line to parse it into a list of list
for line in filename.readlines():
line = line.lstrip('{').rstrip('}\n').strip(' ').split(' ')
Is this the best way to do this? Because I have heard people say that the split function should be seldom used as it slows开发者_开发技巧 down the script considerably.
EDIT: I expect the output to be:
[[a,3,56,'cd',8],[1,2,3,4,'ab',546]]
Assuming there is no whitespace before the openening and after the closing bracket:
li = [line[1:-1].split() for line in file]
or if I can't assume that:
li = [line.strip()[1:-1].split() for line in file]
It may be better to use a module like the csv
module to parse your file. Here is a sample code.
# Your file contents - test.csv
{ 1 2 3 asd 4 5 6 }
{ 5 6 7 8 def 8 9 }
>>> import csv
>>> reader = csv.reader(open('test.csv', 'rb'), delimiter=' ')
>>> all_lines = []
>>> for line in reader:
>>> # if the braces are always in the first and last positions
>>> # you can just do this
>>> all_lines.append(line[1:-1])
>>>
>>> all_lines
[['1', '2', '3', 'asd', '4', '5', '6'], ['5', '6', '7', '8', 'def', '8', '9']]
Note that the list will contain the numbers as strings. You can convert them to numerical format before appending if you want to.
Using a list comprehension:
[ [ c for c in l.split() if c not in ('{', '}') ] for l in filename.readlines() ]
If you wish to avoid split
you could use regex, don't know how this would perform better:
import re
[ re.findall("\w+", l) for l in filename.readlines() ]
I would use one strip procedure:
L = []
for line in file:
values = line.strip('{}\n\r ').split(' ')
L.append(values)
It assumes your values don't have '{}'. It also would work on Windows (since the linebreak on Windows has \r apart from \n).
If several split
functions are used, there are a lot of temporary objects created in the memory on each step (since string
is immutable).
I doubt, if there is any faster solution other than using split
.
Also, there is no need to clutter memory with the file with filename.readlines()
. It can be perfectly read line by line using for line in file
, also it's not OK to name file_object
as 'file_name'
, since they are not exactly the same.
There are some solutions with slicing
(string[1:-1]
). Some testing is required to determine if this approach is faster than with only strip
used.
精彩评论