How to convert varying formats of a string into a list of int? (python)_问答_开发者

How to convert varying formats of a string into a list of int? (python)

开发者 https://www.devze.com 2023-03-17 23:39 出处：网络

m = re.match(r\'(\\d+)(?:-(\\d+))?$\', string) start = m.group(1) end = m.group(2) or start return list(range(int(start, 10), int(end, 10) + 1))

相关专题：python regex

m = re.match(r'(\d+)(?:-(\d+))?$', string)
start = m.group(1)
end = m.group(2) or start
return list(range(int(start, 10), int(end, 10) + 1))

Right now this is able to handle strings in the following format and convert them into a list...

'0-6' results in [0,1,2,3,4,5,6]

'7' results in [7]

Is there anyway I can change the notation to be able to handle strings in the following format as well...

'1 开发者_如何学JAVA2 3 4 5' results in [1,2,3,4,5]

Regular expressions are not all there is to life. In this case, there's really no reason to use regular expressions. Try this, it's over twice as fast as, for example, Shawn Chin's to_num_list on the sample data '0-6 2 3-6' (for all data I tried on it it was between about 1.9 and 4.5 times as fast):

def included_numbers(s):
    out = []
    for chunk in s.split():
        if '-' in chunk:
            f, t = chunk.split('-')
            out.extend(range(int(f), int(t)+1))
        else:
            out.append(int(chunk))
    return out

I would stick to the same notation, then use re.findall() to get all matches. Example

import re
def to_num_list(instr): 
   out = []
   for m in re.finditer(r'(\d+)(?:-(\d+))?', instr):
      if m.group(2) == None:
          out.append(int(m.group(1)))
      else:
          start = int(m.group(1))
          end = int(m.group(2)) 
          out.extend(xrange(start, end + 1))
   return out

This will give you the ability to handle imputs such as "1 2 3 10-15" as well. Example usage:

>>> to_num_list("0-6")
[0, 1, 2, 3, 4, 5, 6]
>>> to_num_list("10")
[10]
>>> to_num_list("1 3 5")
[1, 3, 5]
>>> to_num_list("1 3 5 7-10 12-13")
[1, 3, 5, 7, 8, 9, 10, 12, 13]

and skips over erroneous inputs (which may not necessarily be what you want):

>>> to_num_list("hello world 1 2 3")
[1, 2, 3]
>>> to_num_list("")
[]
>>> to_num_list("1 hello 2 world 3")
[1, 2, 3]
>>> to_num_list("1hello2")
[1, 2]

m = re.match(r'(?:(\d+)(?:-(\d+))|(?:(\d+)(?:\s+|$))+)?$', string)

Then, look in the captures for group 3.

The two input formats can be matched by non-greedy regex (designated by the ? quantifier after the *):

m = re.match(r'^(\d+)[0-9\-\s]*?(\d+)?$', string)

Will always extract the first number and last number into m.group(1) and m.group(2) respectively, or if there is only a single number it will be matched in m.group(1)

See greedy vs non-greedy in the python docs.

If you are ok with using a split you can simplify your regex and let the split handle all the space separated list definitions.

import re

def answer(string):
    m = re.match(r'(\d+)-(\d+)$', string)

    if m:
        start = m.group(1)
        end = m.group(2) or start
        return list(range(int(start), int(end) + 1))

    return map(int, string.split(' '))