edit in progress will re-submit sometimes later edit in progress will re-开发者_开发百科submit sometimes later edit in progress will re-submit sometimes later
That should work:
import re #Regex may be the easiest way to split that line
with open(infile) as in_f, open(outfile,'w') as out_f:
f = (i for i in in_f if i.rstrip()) #iterate over non empty lines
for line in f:
_, k = line.split('\t', 1)
x = re.findall(r'^1..100\t([+-])chr(\d+):(\d+)\.\.(\d+).+$',k)
if not x:
continue
out_f.write(' '.join(x[0]) + '\n')
You can use .strip()
to remove any whitespace around an item before entering it. This would allow a bit more clarity and solve any indentation issues.
For example:
b=a.split('chr').strip() # No white space either side now
c=b[1].split(':').strip() # No white space
d=c[1].split('..').strip()
e=b[0]+'\t'+c[0]+'\t'+d[0]+'\t'+d[1]+'\t'+'\n'
rfh.write(e)
What this will have done is remove any existing whitespace, and let only your \t
's exist.
Why not use a regex split ?
import re
with open(<infile>) as inf:
for annot_info in f:
split_array = re.split(r'(\W+)(chr\w+):(\d+)..(\d+)', annot_info)
#do your sql processing here.
#write out to a file if you wish to.
would give you ['', '+', 'chr6', '140302505', '140302604', '']. You can use the same in your current mysql methods.
PS: The regex pattern I've used would give you empty strings at the beginning and end. Modify the regex or change your sql insert to exclude first and last elements of array while pushing.
精彩评论