开发者

Perl to Python Regex

开发者 https://www.devze.com 2023-03-20 13:45 出处:网络
How could one convert this to 开发者_开发问答Python?The regex is used to match ipv4 addresses, but is there a better way to match this?

How could one convert this to 开发者_开发问答Python? The regex is used to match ipv4 addresses, but is there a better way to match this?

if ($line =~ m{\s+id\s+(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3}),\s+data\s+(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3}),\s+Type Transit\s+(\d{1,2})}) {
    $id = "$1.$2.$3.$4";
    $data = "$5.$6.$7.$8";
}


match = re.search(r"\s+id\s+(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3}),\s+data\s+(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3}),\s+Type Transit\s+(\d{1,2})", subject)
if match:
    id   = ".".join(match.group(1,2,3,4))
    data = ".".join(match.group(5,6,7,8))
else:
    # Match attempt failed


Is regex really the right tool to use for checking an IP address? Probably not.

Just split the string by the dots, and validate the resulting bits as being integers in the range 0-255. Almost certainly less effort for the computer than parsing the string with a regex.

Alternatively, try looking at some of the answers on this question: How to validate IP address in Python? -- there are plenty of good ways of validating an IP address that don't involve regex. (althoug having said that, at least one of the answers to that question does give a pretty comprehensive regex for both IPv4 and IPv6 addresses)

Hope that helps.


Here is a non-regular-expression solution which can provide more accurate diagnostics if you care about it, and will be more precise than what you had for the IP addresses. This will only be taking the whole line though, which may not be what you want.

You're wanting to match strings like this: id XXX.XXX.XXX.XXX, data XXX.XXX.XXX.XXX, Type Transit XX (with variable whitespace in most places).

def extract_ip_addresses(line):
    '''
    Extract the 'id' and 'data' IP addresses from lines of the form::

        ' id X.X.X.X, data X.X.X.X, Type Transit X'

    The number following Type Transit must be a number less than 100 but is not returned.
    Whitespace is flexible.
    '''

    try:
        (id_, id), (data_, data), (type_, transit_, type_transit) = [s.split() for s in line.split(',')]
        if not line.startswith(' ') or id_ != 'id' or data_ != 'data' or type_ != 'Type' or transit_ != 'Transit':
            raise ValueError()
    except ValueError:
        raise ValueError("String in wrong format")
    if len(type_transit) > 2 or not type_transit.isdigit():
        raise ValueError("Type Transit is not a one- or two-digit number.")
    _ = id.split('.')
    if len(_) != 4 or not all(c.isdigit() and 0 <= int(c) < 256 for c in _):
        raise ValueError("Invalid IP address for 'id'.")
    _ = data.split('.')
    if len(_) != 4 or not all(c.isdigit() and 0 <= int(c) < 256 for c in _):
        raise ValueError("Invalid IP address for 'data'.")
    return id, data

Sample usage:

ip, data = extract_ip_addresses('  id   123.45.67.89,    data 98.76.54.210,   Type  Transit  53')
ip == '123.45.67.89'
data == '98.76.54.210'

try:
    extract_ip_addresses('id 1234.5.67.89, data 98.76.54.210, Type Transit 12')
except ValueError as e:  # Invalid IP adderess for 'id'
    print 'Failed as expected, %s' % e

You could also return instead of raising a ValueError, depending on how you want to use it. Then you would check if extract_ip_addresses(line) is None instead of trying it.


Here is a solution improving the regular expression and also adding in IP address validation.

import re

match = re.match(r'\s+id\s+((?:\d+\.){3}\d+),\s+data\s+((?:\d+\.){3}\d+),\s+Type Transit\s+(\d{1,2})', line)
if match:
    id, data = match.group(1, 2)
    # Now actually check the IP addresses.
    _i, _d = id.split('.'), data.split('.')
    if (len(_i) != 4 or not all(s.isdigit() and 0 <= int(s) < 256 for s in _i)
    or len(_d) != 4 or not all(s.isdigit() and 0 <= int(s) < 256 for s in _d)):
        # Cancel that, hit an invalid IP address
        del id, data
        match = None
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号