Basically I am using a python cron to read data from the web and place it in a CSV list in the form of:
.....
###1309482902.37
entry1,36,257.21,16.15,16.168
entry2,4,103.97,16.36,16.499
entry3,2,114.83,16.1,16.3
entry4,130.69,15.6737,16.7498
entry5,5.20,14.4,17
$$$
###1309482902.37
entry1,36,257.21,16.15,16.168
entry2,4,103.97,16.36,16.499
entry3,2,114.83,16.1,16.3
entry4,130.69,15.6737,16.7498
entry5,5.20,14.4,17
$$$
.....
My code is to basically do a regex search and itterate through all the matches between ### and $$$, then go through each match line by line, taking each line and splitting by commas. As you can see some entries have 4 commas, some have 5. Th开发者_如何学Goat is because I was dumb and didn't realize the web source puts commas in it's 4 digit numbers. IE
entry1,36,257.21,16.15,16.168
is suposed to really be
entry1,36257.21,16.15,16.168
I already collected a lot of data and do not want to rewrite, so I thought of a cumbersome workaround. Is there a more pythonic way to do this?
===
contents = ifp.read()
#Pull all entries from the market data
for entry in re.finditer("###(.*\n)*?\$\$\$",contents):
dataSet = contents[entry.start():entry.end()]
dataSet = dataSet.split('\n');
timeStamp = dataSet[0][3:]
print timeStamp
for i in xrange(1,8):
splits = dataSet[i].split(',')
if(len(splits) == 5):
remove = splits[1]
splits[2] = splits[1] + splits[2]
splits.remove(splits[1])
print splits
## DO SOME USEFUL WORK WITH THE DATA ##
===
I'd use Python's csv
module to read in the CSV file, fix the broken rows as I encountered them, then use csv.writer
to write the CSV back out. Like so (assuming your original file, with commas in the wrong place, is ugly.csv
, and the new, cleaned up output file will be pretty.csv
):
import csv
inputCsv = csv.reader(open("ugly.csv", "rb"))
outputCsv = csv.writer(open("pretty.csv", "wb"))
for row in inputCsv:
if len(row) >= 5:
row[1] = row[1] + row[2] #note that csv entries are strings, so this is string concatenation, not addition
del row[2]
outputCsv.writerow(row)
Clean and simple, and, since you're using the proper CSV parser and writer, you shouldn't have to worry about introducing any new weird corner cases (if you had used this in your first script, parsing web results, your commas in your input data would have been escaped).
Normally the csv
module is used to handle CSV files of all formats.
However here you have this ugly situation with the commas, so an ugly hack is appropriate. I don't see a clean solution to this, so I think it's OK to go with whatever works.
Incidentally, this line seems to be redundant:
remove = splits[1]
Others have suggested that you use csv
to parse the file, and that's good advice. But it does not directly address the other issue -- namely, that you're dealing with a file that consists of sections of data. By slurping the file into a single string and then using regex to parse that big string, you are throwing away a key point of leverage on the file. A different strategy is to write a method that can parse the file, yielding a section at a time.
def read_next_section(f):
for line in f:
line = line.strip()
if line.startswith('#'):
# Start of a new section.
ts = line[3:]
data = []
elif line.startswith('$'):
# End of a section.
yield ts, data
else:
# Probably a good idea to use csv, as others recommend.
# Also, write a method to deal with extra-comma problem.
fields = line.split(',')
data.append(fields)
with open(sys.argv[1]) as input_file:
for time_stamp, section in read_next_section(input_file):
# Do stuff.
A more pythonic way to write this block of code
for i in xrange(1,8):
splits = dataSet[i].split(',')
if(len(splits) == 5):
remove = splits[1]
splits[2] = splits[1] + splits[2]
splits.remove(splits[1])
print splits
would be
for row in dataSet:
name, data = row.split(',', 1)
print [name] + data.rsplit(',', 2)
精彩评论