开发者

Ways to parse docs and normalize rows

开发者 https://www.devze.com 2023-02-28 01:20 出处:网络
I feel as if I oversimplify/overcomplexify doc parsing at times, here is what I often do: data = open(\'data.txt\', \'rb\').read()

I feel as if I oversimplify/overcomplexify doc parsing at times, here is what I often do:

data = open('data.txt', 'rb').read()

for line in data.split('\n'):
    if not line.strip():
        continue

and this:

import cs开发者_StackOverflow社区v

filenames=['first_name', 'last_name', 'email', 'postcode', 'telephone_no', ]
reader = csv.DictReader(open('data.csv', 'rb'), filenames=filenames)

for line in reader:
    if line['email'].strip()
        email = line['email'].strip()
    if line['first_name'].strip()
        first_name= line['first_name'].strip().capitalize()
    if line['last_name'].strip()
        last_name = line['last_name'].strip().capitalize()
    if line['postcode'].strip()
        postcode= line['postcode'].strip().upper().replace(' ','')
    if line['telephone_no'].strip()
        telephone_no = line['telephone_no'].strip()

and this:

item = " 4 -2,5456+263 @5"
item = ''.join([char for char in item if char.isdigit()])

item = "+34 0394-234553"
item = item.replace('+','').replace(' ','').replace('-','')

Any tips/suggestions on improvements/alternatives? :)


You could make the the list of non-empty lines a one-liner

lines = filter(None, (line.strip() for line in open('data.txt', 'rb').readlines()))

Fastest way to remove everything except certain characters

Use __contains__ method of a string constant with filter (filter returns string if used with string). So you could remove non-digit characters this way:

import string
filter(string.digits.__contains__, " 4 -2,5456+263 @5")


with open('data.txt', 'rb') as myfile:
    for line in myfile:
        if not line:
            continue

As you probably want to do something with the line, it can be simplified further:

with open('data.txt', 'rb') as myfile:
    for line in myfile:
        if line:
           do_whateveryouwant(line)
0

精彩评论

暂无评论...
验证码 换一张
取 消