I have written a crude Python program to pull phrases from an index in a CSV file and write these rows to another file.
import csv
total = 0
ifile = open('data.csv', "rb")
reader = csv.reader(ifile)
ofile = open('newdata_write.csv', "wb")
writer = csv.writer(ofile, delimiter='\t', quotechar='"', quoting=csv.QUOTE_ALL)
for row in reader:
if ("some text") in row[x]:
total = total + 1
writer.writerow(row)
elif ("some more text") in row[x]:
total = total + 1
writer.writerow(row)
elif ("even more text I'm looking for") in row[x]:
total = total + 1
writer.writerow(row)
< many, many more lines >
print "\nTotal = %d." % total
ifile.close()
My question is this: Isn't there a better (more elegant/less verbose) Pythonic way to do this? I feel this is a case of not knowing what I don't know. The CSV file I'm searching is not large (3863 lines, 669 KB) so I don't think it is necessary to use SQL to solve this, although I am certainly open to that.
I am a Python newbie, in love with the language and teaching myself through the normal channels (books, tutorials, Project Euler, Stack Overflow).
开发者_如何学PythonAny suggestions are greatly appreciated.
You're looking for any
with a generator expression:
matches = "some text", "some more text", "even more text I'm looking for"
for row in reader:
if any(match in row for match in matches):
total += 1
writer.writerow(row)
Alternatively, you could just write all the rows at once:
writer.writerows(row for row in reader if any(match in row for match in matches))
but as written that doesn't get you a total.
It's not a huge improvement, but you could do something like
keyphraseList = (
"some text",
"some more text",
"even more text I'm looking for")
...
for row in reader:
for phrase in keyphraseList:
if phrase in row[x]:
total = total + 1
writer.writerow(row)
break
(not tested)
You can get pythonic by using list comprehensions instead of for loops. For example, if you are looking for index strings 'aa' or 'bb', you could do
matches = [row for row in reader if 'aa' in row[0] or 'bb' in row[0]]
I'm not sure this version is better, just shorter, anyway hope it helps
import csv
total = 0
keys = ['a', 'b', 'c']
with open('infile', 'rb') as infile, open('outfile', 'wb') as outfile:
rows = [x for x in csv.reader(infile) if any([k in x[0] for k in keys])]
csv.writer(outfile, delimiter='\t', quotechar='"', quoting=csv.QUOTE_ALL).writerows(rows)
print 'Total: %d' % len(rows)
not necessairly, 'better', but I would compare the item to a set and clean up total a bit. It may not be 'better' but it is more succinct
This
for row in reader:
if ("some text") in row[x]:
total = total + 1
writer.writerow(row)
elif ("some more text") in row[x]:
total = total + 1
writer.writerow(row)
elif ("even more text I'm looking for") in row[x]:
total = total + 1
writer.writerow(row)
becomes
myWords = set(('some text','some more text','even more'))
for row in reader:
if row[x] in myWords:
total += 1
writer.writerow(row)
you could just use a simple list, but sets become quicker on more memory intensive tasks.
in response to the comment by agf
>>> x = set(('something','something else'))
>>> Ture if 'some' in x else False
False
>>> True if 'something' in x else False
True
is this what your saying would not work?
精彩评论