开发者

Python: How to extract required information from a string?

开发者 https://www.devze.com 2023-01-12 05:44 出处:网络
I am new to Python. Is there a StringTokenizer in Python? Can I do character by character scanning and copying.

I am new to Python. Is there a StringTokenizer in Python? Can I do character by character scanning and copying.

I have the follo开发者_开发知识库wing input string

data = '123:Palo Alto, CA -> 456:Seattle, WA 789'

I need to extract the two (city, state) fields from this string. Here is the code I wrote

name_list = []
while i < len(data)):
      if line[i] == ':':
          name = ''
          j = 0
          i = i + 1
          while line[i] != '-' and line[i].isnumeric() == False:
             name[j] = line[i]   # This line gives error
             i = i + 1
             j = j + 1
          name_list.append(name)
      i = i + 1

What should I do?


data = '123:Palo Alto, CA -> 456:Seattle, WA 789'
citys = []
for record in data.split("->"):
    citys.append(
        re.search(r":(?P<city>[\w\s]+),\s*(?P<state>[\w]+)",record)
        .groupdict()
    )

print citys

Gives:

[{'city': 'Palo Alto', 'state': 'CA'}, {'city': 'Seattle', 'state': 'WA'}]


My take, assuming the string is always formatted as per your example:

import re

data = '123:Palo Alto, CA -> 456:Seattle, WA 789'

name_list = []
r = re.compile("(\s?\d)|:")
name_list += r.sub("", data).split(" ->")
print name_list # Prints ['Palo Alto, CA', 'Seattle, WA']

As a note on your error, the empty string will have a length of 0, so the index 0 doesn't exist:

>>> s = ""
>>> len(s)
0

You can, however, concatenate strings in Python with the + operator, like so:

>>> s += "Some"
>>> s += " Text"
>>> print s
Some Text


You could always use a regular expression, if you wanted: /\d+:(\w+),\s(\w+)/. Its not pretty, but it should get the job done. Assuming string to match is the test string you had.

import re

for s in string_to_match.split("->"):
    m = re.match(r"\d+:(\w+),\s(\w+)", s)
    city = m.group(1)
    state = m.group(2)

Syntax may be a little off, but the general idea is there.


assuming that you always have the string formatted as shown you could do:

cityState = []
for line in data.split('->'):
    cityState.append({'city':city=line.strip().split(',')[0].split(':')[1],
                     'state':state=line.strip().split(',').split(' ')[1]})


You can use regex. Here is my ugly regex, you can do better

inputStr = '123:Palo Alto, CA -> 456:Seattle, WA 789';
m = re.search('.*:(.*),(.*)->.*:(.*),\s*(\S{2})', inputStr)
print "City1=" + m.group(1)
print "State1=" + m.group(2)
print "City2=" + m.group(3)
print "State2=" + m.group(4)   

Produces

City1=Palo Alto
State1= CA 
City2=Seattle
State2=WA
0

精彩评论

暂无评论...
验证码 换一张
取 消