I feel as if I oversimplify/overcomplexify doc parsing at times, here is what I often do:
data = open('data.txt', 'rb').read()
for line in data.split('\n'):
if not line.strip():
continue
and this:
import cs开发者_StackOverflow社区v
filenames=['first_name', 'last_name', 'email', 'postcode', 'telephone_no', ]
reader = csv.DictReader(open('data.csv', 'rb'), filenames=filenames)
for line in reader:
if line['email'].strip()
email = line['email'].strip()
if line['first_name'].strip()
first_name= line['first_name'].strip().capitalize()
if line['last_name'].strip()
last_name = line['last_name'].strip().capitalize()
if line['postcode'].strip()
postcode= line['postcode'].strip().upper().replace(' ','')
if line['telephone_no'].strip()
telephone_no = line['telephone_no'].strip()
and this:
item = " 4 -2,5456+263 @5"
item = ''.join([char for char in item if char.isdigit()])
item = "+34 0394-234553"
item = item.replace('+','').replace(' ','').replace('-','')
Any tips/suggestions on improvements/alternatives? :)
You could make the the list of non-empty lines a one-liner
lines = filter(None, (line.strip() for line in open('data.txt', 'rb').readlines()))
Fastest way to remove everything except certain characters
Use __contains__
method of a string constant with filter
(filter
returns string if used with string). So you could remove non-digit characters this way:
import string
filter(string.digits.__contains__, " 4 -2,5456+263 @5")
with open('data.txt', 'rb') as myfile:
for line in myfile:
if not line:
continue
As you probably want to do something with the line, it can be simplified further:
with open('data.txt', 'rb') as myfile:
for line in myfile:
if line:
do_whateveryouwant(line)
精彩评论