开发者

How do I remove 2nd and rest digits after a period from column one of lines?

开发者 https://www.devze.com 2023-02-21 02:34 出处:网络
How do I remove 2nd and rest digit after the period from column one? For example, HP_000083.21423开发者_JAVA技巧N-1NO99.8951%0.000524499999999983

How do I remove 2nd and rest digit after the period from column one?

For example,

HP_000083.21423  开发者_JAVA技巧    N  -1  NO  99.8951%    0.000524499999999983
NP_075561.1_1908    N   -1  NO  99.9697%    0.000151499999999971

I would like to remove "_1908" from "NP_075561.1_1908"

and "1423 from "HP_000083.21423"

without removing other items from the subsequent columns.

Expected row would be:

HP_000083.2         N          -1       NO        99.8951%  0.000524499999999983
NP_075561.1             N           -1      NO        99.9697%  0.000151499999999971

Here's my code: Some of you had provided part of this solution in the past.

    for line in fname:
        line = re.sub('[\(\)\{\}\'\'\,<>]','', line)
        line = re.sub(r"(\.\d+)_\d+", r"\1", line) 
        fields = line.rstrip("\n").split()
       outfile.write('%s  %s  %s  %s  %s  %s\n' % (fields[0],fields[1],fields[2],fields[3],fields[4],(fields[5])))

Thanks in advance guys, Cheers,


I'd avoid using regular expressions in this case. You can easily make do with standard string methods:

for line in infile:
    first_col, rest = line.split(" ", 1)
    first_col = first_col[:first_col.index(".") + 2]
    output_line = str.join(" ", (first_col, rest))
    outfile.write(output_line)


Here is a solution with a pretty minimal change to the code you provided:

for line in fname:
    line = re.sub('[\(\)\{\}\'\'\,<>]','', line)
    line = re.sub(r"(\.\d)\d*_?\d*", r"\1", line, 1)
    fields = line.rstrip("\n").split()
    outfile.write('%s  %s  %s  %s  %s  %s\n' % (fields[0],fields[1],fields[2],fields[3],fields[4],(fields[5])))
0

精彩评论

暂无评论...
验证码 换一张
取 消