I have a large file called fulldataset
. I would like to write lines from fulldataset to a new file called newdataset. I only want to write the lines from fulldataset though that contain the id numbers present in the listfile. Also all the id numbers start with XY. The id numbers occur in the middle of each line though.
Here is an example line from list file:
Robert, Brown, "XY-12344343", 1929232, 324934923,
Here is the program I have so far. It runs fine, but doesn't write anything into the new file.
datafile = open('C:\\listfile.txt', 'r')
completedataset = open('C:\\fulldataset.txt', 'r')
smallerdataset = open('C:\\newdataset.txt', 'w')
matchedLines = []
for line in datafile:
if line.find("XY"):
matchedLines.append( line )
counter = 1
for line in completedataset:
print counter
counter +=1
for t in matchedLines:
if t in line:
fulldataset.write(line)
del line
break
datafile.close()
completedataset.close()
fulldataset.close()
EDIT:
Ok here is the new program:
datafile = open('C:\\tryexcel33.txt', 'r')
completedataset = open('C:\\fulldataset.txt', 'r')
smallerdataset = open('C:\\newdataset.txt', 'w')
counter = 1
for line in completedataset:
print counter
counter +=1
if any( id in line for id 开发者_Go百科in datafile ):
smallerdataset.write( line )
break
datafile.close()
completedataset.close()
fulldataset.close()
I still don't have anything being written to the new file. I think a problem might be that in the full file the id numbers have a " in front of them but this doesn't exist in the listfile. Any thoughts?
I don't understand your code. Here's the code to do what you've asked:
ids = set( datafile.readlines( ) )
for line in fulldataset:
if any( id in line for id in ids ):
smallerdataset.write( line )
EDIT: I did the best I could with incomplete data. The fact that the IDs in the fulldataset are prefixed with XY is irrelevant, since we are searching through the whole string anyway ("foo" in "XY-foo"
is still true). If no lines are being written, that's because the lines of datafile
are not exactly IDs. Please post a sample from datafile
.
You are also reusing the variable line
, which will make your code go wrong in mysterious ways.
You also have a break
statement, which will cause at most one line to be written. Why?
EDIT
Many apologies, I just re-read the code -- for some reason I had assumed that datafile
was a list. It's actually a file, so my previous code won't work. Please see the fixed code.
精彩评论