Matching G in 'Reference: G. ' using regular expression
I tried using this but error still occur
refresidue = re.compiler(r'(s/Reference: \ //n)')
Any other suggestions as I'm quite new in this. Any help is most appreciated.
'Reference: G. ' reference can be either A,C,G or T
I'm sorry about the confusion - what i would like to have is that the output only prints out the characters (A,C,G,T) instead of Reference: .
This is my code
refresidue = re.compiler(r'(s/Reference: \ //n)')
a_matchref = refresidue.search(row[2])
if a_matchref is not None:
a_matchref = a_matchref.gr开发者_StackOverflow社区oup(1)
You're mixing regex syntax from JavaScript (or some other regex flavor) and Python; and the regex itself is also quite strange. Also, re.compile()
compiles a regex, it doesn't match it to anything.
Assuming you want to match a single alphanumeric character after the text Reference:
, try the following:
refresidue = re.search(r"Reference:\s*(\w)", your_text_to_be_matched).group(1)
Here's how I resolved the problem step-by-step. Even after several years of experience with regexp, some particular syntaxes always escapes my mind. At such times, it's best to start with a short expression which absolutely should match what you want.
Let's use the re
module.
>>> import re
Now what is the error?
>>> refresidue = re.compiler(r'(s/Reference: \ //n)')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'compiler'
Ah, so what attributes does the re
module have?
>>> dir(re)
['DEBUG', 'DOTALL', 'I', 'IGNORECASE', 'L', 'LOCALE', 'M', 'MULTILINE', 'S',
'Scanner', 'T', 'TEMPLATE', 'U', 'UNICODE', 'VERBOSE', 'X', '_MAXCACHE', '__all__',
'__builtins__', '__doc__', '__file__', '__name__', '__version__', '_alphanum',
'_cache', '_cache_repl', '_compile', '_compile_repl', '_expand', '_pattern_type',
'_pickle', '_subx', 'compile', 'copy_reg', 'error', 'escape', 'findall', 'finditer',
'match', 'purge', 'search', 'split', 'sre_compile', 'sre_parse', 'sub', 'subn', 'sys',
'template']
So it must be re.compile
>>> refresidue = re.compile(r'(s/Reference: \ //n)')(re)
Ok, compilation complete. Let's use it to match the string.
>>> refresidue.match('Reference: G')
Nothing? Strip down the expression then.
>>> refresidue = re.compile(r'Reference:')
>>> refresidue.match('Reference: G')
<_sre.SRE_Match object at 0x7fe14701f030>
Of course it should match. How about adding the G?
>>> refresidue = re.compile(r'Reference: G')
>>> refresidue.match('Reference: G')
<_sre.SRE_Match object at 0x7fe14701f098>
Yes. I want the whole alphabet please.
>>> refresidue = re.compile(r'Reference: [A-Z]')
>>> refresidue.match('Reference: G')
<_sre.SRE_Match object at 0x7fe14701f030>
I also want to single out the letter.
>>> refresidue = re.compile(r'Reference: ([A-Z])')
>>> refresidue.match('Reference: G')
<_sre.SRE_Match object at 0x7fe1470b9738>
No problem so far. So how do I get at the parenthesized part?
>>> dir(refresidue.match('Reference: G'))
['__copy__', '__deepcopy__', 'end', 'expand', 'group', 'groupdict', 'groups', 'span', 'start']
group
sounds like it.
>>> refresidue.match('Reference: G').group
<built-in method group of _sre.SRE_Match object at 0x7fe1470b9738>
So it's a method. Let's try calling it.
>>> refresidue.match('Reference: G').group(0)
'Reference: G'
How about this?
>>> refresidue.match('Reference: G').group(1)
'G'
There, the G.
I think this is what you after, but maybe you can add more examples about the kind of data your are matching-
import re
refresidue = re.compile(r'Reference: ([A-Z])')
You use the above like this:
>>>> refresidue.match("Reference: G").group(1)
'G'
精彩评论