Perl to Python Regex_问答_开发者_运维开发者技术经验分享

How could one convert this to 开发者_开发问答Python? The regex is used to match ipv4 addresses, but is there a better way to match this?

if ($line =~ m{\s+id\s+(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3}),\s+data\s+(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3}),\s+Type Transit\s+(\d{1,2})}) {
    $id = "$1.$2.$3.$4";
    $data = "$5.$6.$7.$8";
}

match = re.search(r"\s+id\s+(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3}),\s+data\s+(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3}),\s+Type Transit\s+(\d{1,2})", subject)
if match:
    id   = ".".join(match.group(1,2,3,4))
    data = ".".join(match.group(5,6,7,8))
else:
    # Match attempt failed

Is regex really the right tool to use for checking an IP address? Probably not.

Just split the string by the dots, and validate the resulting bits as being integers in the range 0-255. Almost certainly less effort for the computer than parsing the string with a regex.

Alternatively, try looking at some of the answers on this question: How to validate IP address in Python? -- there are plenty of good ways of validating an IP address that don't involve regex. (althoug having said that, at least one of the answers to that question does give a pretty comprehensive regex for both IPv4 and IPv6 addresses)

Hope that helps.

Here is a non-regular-expression solution which can provide more accurate diagnostics if you care about it, and will be more precise than what you had for the IP addresses. This will only be taking the whole line though, which may not be what you want.

You're wanting to match strings like this: id XXX.XXX.XXX.XXX, data XXX.XXX.XXX.XXX, Type Transit XX (with variable whitespace in most places).

def extract_ip_addresses(line):
    '''
    Extract the 'id' and 'data' IP addresses from lines of the form::

        ' id X.X.X.X, data X.X.X.X, Type Transit X'

    The number following Type Transit must be a number less than 100 but is not returned.
    Whitespace is flexible.
    '''

    try:
        (id_, id), (data_, data), (type_, transit_, type_transit) = [s.split() for s in line.split(',')]
        if not line.startswith(' ') or id_ != 'id' or data_ != 'data' or type_ != 'Type' or transit_ != 'Transit':
            raise ValueError()
    except ValueError:
        raise ValueError("String in wrong format")
    if len(type_transit) > 2 or not type_transit.isdigit():
        raise ValueError("Type Transit is not a one- or two-digit number.")
    _ = id.split('.')
    if len(_) != 4 or not all(c.isdigit() and 0 <= int(c) < 256 for c in _):
        raise ValueError("Invalid IP address for 'id'.")
    _ = data.split('.')
    if len(_) != 4 or not all(c.isdigit() and 0 <= int(c) < 256 for c in _):
        raise ValueError("Invalid IP address for 'data'.")
    return id, data

Sample usage:

ip, data = extract_ip_addresses('  id   123.45.67.89,    data 98.76.54.210,   Type  Transit  53')
ip == '123.45.67.89'
data == '98.76.54.210'

try:
    extract_ip_addresses('id 1234.5.67.89, data 98.76.54.210, Type Transit 12')
except ValueError as e:  # Invalid IP adderess for 'id'
    print 'Failed as expected, %s' % e

You could also return instead of raising a ValueError, depending on how you want to use it. Then you would check if extract_ip_addresses(line) is None instead of trying it.

Here is a solution improving the regular expression and also adding in IP address validation.

import re

match = re.match(r'\s+id\s+((?:\d+\.){3}\d+),\s+data\s+((?:\d+\.){3}\d+),\s+Type Transit\s+(\d{1,2})', line)
if match:
    id, data = match.group(1, 2)
    # Now actually check the IP addresses.
    _i, _d = id.split('.'), data.split('.')
    if (len(_i) != 4 or not all(s.isdigit() and 0 <= int(s) < 256 for s in _i)
    or len(_d) != 4 or not all(s.isdigit() and 0 <= int(s) < 256 for s in _d)):
        # Cancel that, hit an invalid IP address
        del id, data
        match = None

Perl to Python Regex

精彩评论

关注公众号

热门标签

图文推荐

Perl to Python Regex

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：