Python code to accept many different formats of US phone numbers?_问答_开发者

Python code to accept many different formats of US phone numbers?

开发者 https://www.devze.com 2022-12-11 04:25 出处：网络

I\'m开发者_如何学编程 reading in lots of user entered data that represent phone numbers from files.They are all slightly entered in differently:

相关专题：python

I'm开发者_如何学编程 reading in lots of user entered data that represent phone numbers from files. They are all slightly entered in differently:

5555555555
555-555-5555
555-555/5555
1555-555-5555
etc...

How could I easily parse in all of these phone numbers in Python and produce a canonical output like: 555-555-5555?

Dive into Python has a section on parsing phone numbers

http://www.diveintopython.org/regular_expressions/phone_numbers.html

I'm not american, but this works with russian phone numbers... maybe it applies to american ones too?

Discard all non-number characters
Validate amount of the numbers left
Insert several dashes in appropriate places

take only the numbers with a regex. then find out if they appended the 1 (NO area code starts with 1). if it's there, remove it otherwise, format the 10 digits the way you want.

import re
pnumber = re.sub("[^0-9]", "", input_number)
if pnumber[0] == 1:
    pnumber = pnumber[1:] #strip 1st char if 1

#insert the dashes
if len(pnumber) == 10:
    pnumber = "%s-%s-%s" % (pnumber[:3],pnumber[3:6],pnumber[6:])
else:
    #throw error

After a little preparation with string.maketrans, strings' translate method affords very fast and simple operation. I'm giving Python 2 code for plain strings (Python 3, and Unicode strings in Python 2, are a bit different -- ask if that's what you need):

The preparation (do once and for all, e.g. at module load time):

>>> import string
>>> allchars = string.maketrans('', '')
>>> nondigits = allchars.translate(allchars, string.digits)

The execution (turn any suitable string into the property formatted number):

>>> x='1555-555-5555'
>>> y=(x.translate(allchars, nondigits)).lstrip('1')
>>> assert len(y) == 10
>>> '%s-%s-%s' % (y[:3], y[3:6], y[6:])
'555-555-5555

Of course, you'll need to decide what to do when len(y) does not equal 10 (just raise an exception as I'm doing here, or, what else). But, this would be needed for any other form of processing (regex or whatever) just as well. The translate approach is really really fast and simple!-)

def extractNumber(s):
    """take a string phone number and extract it to the legal string"""

    target = ""
    for char in s:
        try:
            target += int(s)
        except ValueError:
            target += '-'

    return target

Decide which formats you want to recognize, then create a regular expression matching each one grouping the different parts of the number (like area code, prefix etc). Finally use a substitution to generate the canonical output you desire.

Example:

to match

xxx-xxx-xxxx   -> \d{3}-\d{3}-\d{4}
(xxx) xxx-xxxx -> \(\d{3}\) \d{3}-\d{4}
1-xxx-xxx-xxx  -> 1-\d{3}-\d{3}-\d{4}

This ignores rules that restrict prefix and area code (the US doesn't allow area codes or prefixes that being with 0 or 1). You could try and be super smart and create one regular expression that matches everything but you will end up with a jumbled mess that is impossible to modify instead you should OR the patterns together to make them easier to modify in the future.

basic idea:

pattern = re.compile(r'\d{3}-\d{3}-\d{4}|\(\d{3}\) \d{3}-\d{4}|1-\d{3}-\d{3}-\d{4}')

with grouping added for canonical output

pattern = re.compile(r'(\d{3})-(\d{3})-(\d{4})|\((\d{3})\) (\d{3})-(\d{4})|1-(\d{3})-(\d{3})-(\d{4})')

then just run that against your inputs and for each phone number input you will have 3 matching groups, one for the area code, one for the prefix, and one for the suffix which you can output however you want. You need a basic understanding of regular expressions, but it shouldn't be too hard.