I am trying to split one big file into individual entries. Each entry ends with the character “//”. So when I try to use
#!/usr/bin/python
import sys,os
uniprotFile=open("UNIPROT-data.txt") #read original alignment file
uniprotFileContent=uniprotFile.read()
uniprotFileList=uniprotFileContent.split("//")
for items in uniprotFileList:
seqInfoFile=open('%s.dat'%items[5:14],'w')
seqInfoFile.write(str(items))
But I realised that there is another string with “//“(http://www.uniprot.org/terms) hence it splits there as well and eventually I d开发者_Go百科on’t get the result I want. I tried using regex but was not abler to figure it out.
Use a regex that only splits on //
if it's not preceded by :
import re
myre = re.compile("(?<!:)//")
uniprotFileList = myre.split(uniprotFileContent)
I am using the code with modified split pattern and it works fine for me:
#!/usr/bin/python
import sys,os
uniprotFile = open("UNIPROT-data.txt")
uniprotFileContent = uniprotFile.read()
uniprotFileList = uniprotFileContent.split("//\n")
for items in uniprotFileList:
seqInfoFile = open('%s.dat' % items[5:17], 'w')
seqInfoFile.write(str(items))
You're confusing \
(backslash) and /
(slash). You don't need to escape a slash, just use "/"
. For a backslash, you do need to escape it, so use "\\"
.
Secondly, if you split with a backslash it will not split on a slash or vice-versa.
Split using a regular exception that doesn't permit the "http:" part before your // marker. For example: "([^:])\/\/"
You appear to be splitting on the wrong characters. Based on your question, you should split on r"\", not "//". Open a prompt and inspect the strings you're using. You'll see something like:
>>> "\\"
'\\'
>>> "\"
SyntaxError
>>> r"\"
'\\'
>>> "//"
'//'
So, you can use "\" or r"\" (I recommend r"\" for clarity in splitting and regex operations.
精彩评论