I'm trying to create a regexp that will search for all 12 digit hex values in a file. For example: string 448699 => hex 343438363939. The regexp that I have now 开发者_运维知识库is:
(r'3[0-9]\d{10}')
it matches the first character to 3 the second 0-9 and the next 10 are any random digits. The hex above is 12 digits and starting with the first character, every other character is a 3. How would I express this with regexp. I was thinking along the lines of the following, but not sure:
(r'3[0-9]3[0-9]3[0-9]3[0-9]3[0-9]3[0-9]')
You're really close. The pattern you want is:
(r'(?:3\d){6}')
Edit - As Mike Pennington pointed out, hex numbers can include letters A-F. I wasn't actually sure of the purpose of the "every other digit is 3" rule, so I left the rules as you described them.
Confusion reigns supreme ...
If you want to match the result of converting ANY 6-byte sequence to a 12-hex-digit string (what you asked for), you need [0-9a-fA-F]{12}
.
If you want to match the result of converting a 6-byte sequence of ASCII decimal digits to a 12-hex-digit string (what your sample code indicates), you need (?:3[0-9]){6}
.
Nitpick: You should NOT use \d
as if you are using Unicode it will pick up any non-ASCII decimal digits, which are NOT hex digits (over 300 possibilities).
The suggestion (3[0-9a-fA-F]){6}
detects 6-byte sequences of bytes drawn from 0123456789:;<=>?
which is unlikely to be what you want.
Update with request for clarification.
Please consider the following and let us know which pattern is actually finding what you want it to find, and is not letting "false positives" through the gate.
>>> import re, binascii
>>> originals = ('123456', 'FOOBAR', ':;<=>?')
>>> data = ' '.join(map(binascii.hexlify, originals))
>>> print data
313233343536 464f4f424152 3a3b3c3d3e3f
>>> for pattern in (r'(?:3\d){6}', r'(3[0-9a-fA-F]){6}',
... r'(?:3[0-9a-fA-F]){6}', r'[0-9a-fA-F]{12}'):
... print repr(pattern), re.findall(pattern, data)
...
'(?:3\\d){6}' ['313233343536']
'(3[0-9a-fA-F]){6}' ['36', '3f']
'(?:3[0-9a-fA-F]){6}' ['313233343536', '3a3b3c3d3e3f']
'[0-9a-fA-F]{12}' ['313233343536', '464f4f424152', '3a3b3c3d3e3f']
You're almost there... since hex can have a-f
you need to include those...
(r'(3[0-9a-fA-F]){6}')
精彩评论