I have large file comprising ~100,000 lines. Each line corresponds to a cluster and each entry within each line is a reference i.d. for another file (protein structure in this case), e.g.
1hgn 1dju 3nmj 8kfn
9opu 7gfb
4bui
I need to read in the file as a list of lists where each line is a sublist, thus preserving the integrity of the cluster, e.g.
nested_list = [['1hgn', '1dju', '3nmj', '8kfn'], ['9opu', '7gfb'], ['4bui']]
My current code creates a nested list but the entries within each list are a single string and not comma separated. Therefore, I cannot splice the list with indices so easily.
Any help great开发者_高级运维ly appreciated.
Thanks, S :-)
Super simple:
with open('myfile', 'r') as f:
data = [line.split() for line in f]
You'll want to investigate the str.split()
method.
>>> '1hgn 1dju 3nmj 8kfn'.split()
['1hgn', '1dju', '3nmj', '8kfn']
精彩评论