I have a text file that is tab delimited and looks like:
1_0 NP_045689 100.00 279 0 0 18 开发者_运维知识库 296 18 296 3e-156 539
1_0 NP_045688 54.83 259 108 6 45 296 17 273 2e-61 224
I need to parse out specific columns such as column 2.
I've tried with the code below:
z = open('output.blast', 'r')
for line in z.readlines():
for col in line:
print col[1]
z.close()
But i get a index out of range error.
z = open('output.blast', 'r')
for line in z.readlines():
cols = line.split('\t'):
print cols[1]
z.close()
You need to split()
the line on tab characters first.
Alternatively, you could use Python's csv
module in tab-delimiters mode.
Check out the csv
module. That should help you a lot if you plan on doing more stuff with your tab-delimited files, too. One nice thing is that you can assign names to the various columns.
import csv,StringIO
text="""1_0 NP_045689 100.00 279 0 0 18 296 18 296 3e-156 539
1_0 NP_045688 54.83 259 108 6 45 296 17 273 2e-61 224"""
f = csv.reader(StringIO.StringIO(text), delimiter='\t')
for row in f:
print row[1]
two things of note:
the delimiter argument to the reader method tells the csv module how to split the text line. Check the other arguments to the reader function to extend functionality (ie: quotechar)
I use StringIO to wrap the text example as a file object, you dont need that if you are using a file reference.
ex:
f=csv.reader(open('./test.csv'),delimiter='\t')
This has already been answered, but I thought I'd share the use of namedtuples for this sort of situation, as it allows pleasant object.attribute type attribute access.
from collections import namedtuple
import csv
rec = namedtuple('rec', 'col1, col2, col3, col4, col5')
for r in map(rec._make, csv.reader(open("myfile.tab", "rb"), delimiter='\t')):
print r.col2, r.col5
See the Python collections documentation for more details.
This is why your code is going wrong:
for col in line:
will iterate over every CHARACTER in the line.
print col[1]
A character is a string of length 1, so col[1] is always going to give an index out of range error.
As others have said, you either need to split the line on the TAB character '\t'
, or use the csv module, which will correctly handle quoted fields that may contain tabs or newlines.
I also recommend avoiding using readlines - it will read the entire file into memory, which may cause problems if it is very large. You can iterate over the open file a line at a time instead:
z = open('output.blast', 'r')
for line in z:
...
精彩评论