A sample of the following text file i have is:
> 1 -4.6 -4.6 -7.6
>
> 2 -1.7 -3.8 -3.1
>
> 3 -1.6 -1.6 -3.1
the data is separated by tabs in the text file and the first column indicates the position.
I need to iterate through every value in the text file apart from column 0 and find the lowest value.
once the lowest value has been found that value needs to be written to a new text file along with the column name and position. Column 0 has the name "position" Column 1 "fifteen", column 2 "sixteen" and column 3 "seventeen"
for example the lowest value in the above data is "-7.6" and is in column 3 which has the name "seventeen". Therefore "7.6", "seventeen" and its position value which in this case is 1 need to be written to the new text file.
I then need a number of rows deleted from the above text file.
E.G. the lowest value above is "-7.6" and is found at position "1" and is found in column 3 which as the name "seventeen". I therefore need seventeen rows deleted from the text file starting from and including position 1
so the the column in which the lowest value is found denotes the amount of rows that 开发者_JS百科needs to be deleted and the position it is found at states the start point of the deletion
Open this file for reading, another file for writing, and copy all the lines that don't match the filter:
readfile = open('somefile', 'r')
writefile = open('otherfile', 'w')
for line in readfile:
if not somepredicate(line):
writefile.write(line)
readfile.close()
writefile.close()
Here's a stab at what I think you wanted (though your requirements were kind of difficult to follow):
def extract_bio_data(input_path, output_path):
#open the output file and write it's headers
output_file = open(output_path, 'w')
output_file.write('\t'.join(('position', 'min_value', 'rows_skipped')) + '\n')
#map column indexes (after popping the row number) to the number of rows to skip
col_index = { 0: 15,
1: 16,
2: 17 }
skip_to_position = 0
for line in open(input_path, 'r'):
#remove the '> ' from the beginning of the line and strip newline characters off the end
line = line[2:].strip()
#if the line contains no data, skip it
if line == '':
continue
#split the columns on whitespace (change this to split('\t') for splitting only on tabs)
columns = line.split()
#extract the row number/position of this data
position = int(columns.pop(0))
#this is where we skip rows/positions
if position < skip_to_position:
continue
#if two columns share the minimum value, this will be the first encountered in the list
min_index = columns.index(min(columns, key=float))
#this is an integer version of the 'column name' which corresponds to the number of rows that need to be skipped
rows_to_skip = col_index[min_index]
#write data to your new file (row number, minimum value, number of rows skipped)
output_file.write('\t'.join(str(x) for x in (position, columns[min_index], rows_to_skip)) + '\n')
#set the number of data rows to skip from this position
skip_to_position = position + rows_to_skip
if __name__ == '__main__':
in_path = r'c:\temp\test_input.txt'
out_path = r'c:\temp\test_output.txt'
extract_bio_data(in_path, out_path)
Things that weren't clear to me:
- Is there really "> " at the beginning of each line or is that a copy/paste error?
- I assumed it wasn't an error.
- Did you want "7.6" or "-7.6" written to the new file?
- I assumed you wanted the original value.
- Did you want to skip rows in the file? or positions based on the first column?
- I assumed you wanted to skip positions.
- You say you want to delete data from the original file.
- I assumed that skipping positions was sufficient.
精彩评论