I have two files and the content is as follows:
alt text http://img144.imageshack.us/img144/4423/screencapture2b.png
alt text http://img229.imageshack.us/img229/9153/screencapture1c.png
Please only consider the bolded column and the red column. The remaining text is junk and unnecessary. As evident from the two files they are similar in many ways. I am trying to compare the bolded text in file_1 and file_2 (it is not bolded but hope you can make out it is the same column) and if they are different, I want to print out the red text from file_1. I achieved this by the following script:
import string
import itertools
chain_id=[]
for file in os.listdir("."):
basename = os.path.basename(file)
if basename.startswith("d.complex"):
chain_id.append(basename)
for i in chain_id:
print i
g=codecs.open(i, encoding='utf-8')
f=codecs.open("ac_chain_dssp.dssp", encoding='utf-8')
for (x, y) in itertools.izip(g, f):
if y[11]=="C":
if y[35:38]!= "EN":
if y[35:38] != "OTE":
if x[11]=="C":
if x[12] != "C":
if y[35:38] !=x[35:38]:
print x [7:10]
g.close()
f.close()
But the results I got were not what I expected. Now I want to modify the above code in such a way that when I compare the bolded column, if the difference between the values is more than 2, then it has to print out the results. For example, row-1 of bolded column in file_1 is 83 and in file_2 it is 84 since the difference between the two is less than two, I want it to be rejected.
Can someone help me in adding the remaining code? Cheers, Chavanak
PS: This is not homework 开发者_运维知识库:)
The direct answer to your question is to alter the last condition,
if y[35:38] !=x[35:38]:
so that instead the "field" at [35:38] get converted to int (or float...) and a difference can be applied to them. Giving something like
try:
iy = int(y[35:38])
ix = int(x[35:38])
except ValueError:
# here for whatever action is appropriate, including silent ignoring.
print("Unexpected value for record # %s" % x[7:10])
if abs(ix - iy) > 2:
print(x[7:10])
More indirectly, the snippet in the question prompt the following remarks,which may in turn suggest different approaches to the problem.
- first off, if the files are strictly "fixed format", if they are very big, and/or if nothing else is done with any of the other "fields" values found in the file, the current approach is valid and probably very efficient.
- alternatively, the logic may be made more resilient to possible variations in the file structure etc, by parsing in the "fields" of the file, rather than addressing these as slices of a long string. Loot into the standard library's csv module for possible parser support.
- some tests seem goofy / always true etc (like comparing a 3 characters slice to a 2 character string literal. Aside from being logically wrong, this too points to a more "parsed" solution where such logical error are more readily avoided or more obvious.
Nothing to do with your problem, but this:
if y[11]=="C":
if y[35:38]!= "EN":
# I don't see any "EN" or "OTE" anywhere in your sample input.
# In any case the above condition will always be true, because
# y[35:38] appears to be a 3-byte string but "EN" is a 2-byte string.
if y[35:38] != "OTE":
if x[11]=="C":
if x[12] != "C":
if y[35:38] !=x[35:38]:
print x [7:10]
is ummmmm ...
You may wish to consider an alternative way of expression e.g.
if (x[11] == "C" == y[11]
and x[12] != "C"
and y[35:38] not in ("EN?", "OTE")
and y[35:38] != x[35:38]):
print x[7:10]
I haven't understood your problem fully but
File 1
100 C 20.2
300 B 33.3
File 2
110 C 20.23
320 B 33.34
and you want to compare 3rd column of the two files.
lines1 = file1.readlines()
list1 = [float(line.split()[2]) for line in lines1] # list of 3rd column values
lines2 = file2.readlines()
list2 = [float(line.split()[2]) for line in lines2]
result = map(lambda x,y: x-y < 2,list1,list2)
OR
result = [list1[i]-list2[i] for i in range(len(list1)) if list1[i] - list2[i] > 2]
Is this what you want??
精彩评论