开发者

Writing certain lines to a file in python

开发者 https://www.devze.com 2023-01-12 12:47 出处:网络
I have a large file called fulldataset. I would like to write lines from fulldataset to a new file called newdataset. I only want to write the lines from fulldataset though that contain the id numbers

I have a large file called fulldataset. I would like to write lines from fulldataset to a new file called newdataset. I only want to write the lines from fulldataset though that contain the id numbers present in the listfile. Also all the id numbers start with XY. The id numbers occur in the middle of each line though.

Here is an example line from list file:

Robert, Brown, "XY-12344343", 1929232, 324934923, 

Here is the program I have so far. It runs fine, but doesn't write anything into the new file.

datafile = open('C:\\listfile.txt', 'r')
completedataset = open('C:\\fulldataset.txt', 'r')
smallerdataset = open('C:\\newdataset.txt', 'w')

matchedLines = []

for line in datafile:
    if line.find("XY"):
        matchedLines.append( line )

counter = 1
for line in completedataset:
    print counter
    counter +=1

    for t in matchedLines:
        if t in line:
            fulldataset.write(line)
            del line
            break

datafile.close()
completedataset.close()
fulldataset.close()

EDIT:

Ok here is the new program:

datafile = open('C:\\tryexcel33.txt', 'r')
completedataset = open('C:\\fulldataset.txt', 'r')
smallerdataset = open('C:\\newdataset.txt', 'w')


counter = 1
for line in completedataset:
    print counter
    counter +=1

    if any( id in line for id 开发者_Go百科in datafile ):
        smallerdataset.write( line )
        break

datafile.close()
completedataset.close()
fulldataset.close()

I still don't have anything being written to the new file. I think a problem might be that in the full file the id numbers have a " in front of them but this doesn't exist in the listfile. Any thoughts?


I don't understand your code. Here's the code to do what you've asked:

ids = set( datafile.readlines( ) )
for line in fulldataset:
    if any( id in line for id in ids ):
        smallerdataset.write( line )

EDIT: I did the best I could with incomplete data. The fact that the IDs in the fulldataset are prefixed with XY is irrelevant, since we are searching through the whole string anyway ("foo" in "XY-foo" is still true). If no lines are being written, that's because the lines of datafile are not exactly IDs. Please post a sample from datafile.

You are also reusing the variable line, which will make your code go wrong in mysterious ways.

You also have a break statement, which will cause at most one line to be written. Why?


EDIT

Many apologies, I just re-read the code -- for some reason I had assumed that datafile was a list. It's actually a file, so my previous code won't work. Please see the fixed code.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号