Python: Indexing a file that is tab delimited_问答_开发者

Python: Indexing a file that is tab delimited

开发者 https://www.devze.com 2023-01-05 22:20 出处：网络

I have a text file that is tab delimited and looks like: 1_0NP_045689100.002790018 开发者_运维知识库296182963e-156539

I have a text file that is tab delimited and looks like:

1_0 NP_045689 100.00 279 0 0 18 开发者_运维知识库 296 18 296 3e-156 539

1_0 NP_045688 54.83 259 108 6 45 296 17 273 2e-61 224

I need to parse out specific columns such as column 2.

I've tried with the code below:

z = open('output.blast', 'r')
for line in z.readlines():
    for col in line:
        print col[1]
z.close()

But i get a index out of range error.

z = open('output.blast', 'r')
for line in z.readlines():
    cols = line.split('\t'):
        print cols[1]
z.close()

You need to split() the line on tab characters first.

Alternatively, you could use Python's csv module in tab-delimiters mode.

Check out the csv module. That should help you a lot if you plan on doing more stuff with your tab-delimited files, too. One nice thing is that you can assign names to the various columns.

import csv,StringIO
text="""1_0 NP_045689   100.00  279 0   0   18  296 18  296 3e-156  539
1_0 NP_045688   54.83   259 108 6   45  296 17  273 2e-61   224"""

f = csv.reader(StringIO.StringIO(text), delimiter='\t')
for row in f:
    print row[1]

two things of note:

the delimiter argument to the reader method tells the csv module how to split the text line. Check the other arguments to the reader function to extend functionality (ie: quotechar)

I use StringIO to wrap the text example as a file object, you dont need that if you are using a file reference.

ex:

f=csv.reader(open('./test.csv'),delimiter='\t')

This has already been answered, but I thought I'd share the use of namedtuples for this sort of situation, as it allows pleasant object.attribute type attribute access.

from collections import namedtuple
import csv
rec = namedtuple('rec', 'col1, col2, col3, col4, col5')
for r in map(rec._make, csv.reader(open("myfile.tab", "rb"), delimiter='\t')):
    print r.col2, r.col5

See the Python collections documentation for more details.

This is why your code is going wrong:

for col in line:

will iterate over every CHARACTER in the line.

    print col[1]

A character is a string of length 1, so col[1] is always going to give an index out of range error.

As others have said, you either need to split the line on the TAB character '\t', or use the csv module, which will correctly handle quoted fields that may contain tabs or newlines.

I also recommend avoiding using readlines - it will read the entire file into memory, which may cause problems if it is very large. You can iterate over the open file a line at a time instead:

z = open('output.blast', 'r')
for line in z:
    ...

Python: Indexing a file that is tab delimited

精彩评论

关注公众号

热门标签

图文推荐

Python: Indexing a file that is tab delimited

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：