开发者

Random Loss of precision in Python ReadLine()

开发者 https://www.devze.com 2023-01-03 01:36 出处:网络
We have a process which takes a very large csv (1.6GB) and breaks it down into pieces (in this case 3).This runs nightly and normally doesn\'t give us any problems.When it ran last night, however, the

We have a process which takes a very large csv (1.6GB) and breaks it down into pieces (in this case 3). This runs nightly and normally doesn't give us any problems. When it ran last night, however, the first of the output files had lost precision on the numeric fields in the data. The active ingredient in the script are the lines:

while lineCounter <= chunk:
    oOutFile.write(oInFile.readline())
    lineCounter = lineCounter + 1

and the normal output might be something like

StringField1; StringField2; StringField3; StringField4; 1000000; StringField5; 0.000054454

etc.

On this one occasion and in this one output file the numeric fields were all output with 6 zeros at the end i.e.

StringField1; StringField2; StringField3; StringField4; 1000000.000000; StringField5; 0.000000

We are using Python v2.6 (and don't want to upgrade unless we really have to) but we can't afford to lose this data. Does anyone have any idea why this might have happened? If the readline is doing some kind of implicit conversion is there a way to do a binary read, because we really just want this data to pass through untouched?

It is very wierd to us that this only affected one of the output files generated by the same script, and when it was rerun the output was as expected.

thanks

Jack

(readlines method referenced in below thread)

f = open(filename)                   
lines = 0 
buf_size = 1024 * 1024 
read_f = f.read # loop optimization 

buf = read_f(buf_size) 
while buf: 
    lines += buf.count('\n') 
    buf = read_f(buf_size) 

return lines 开发者_StackOverflow


.readline() doesn't do anything with the content of the line, certainly not with numbers, so it's definitely not the culprit.

Thanks for giving more info, but this still looks very mysterious to me as neither function should be causing such a change. You didn't open the output in Excel, by any chance? Sometimes Excel does weird things and interprets stuff in an unexpected way. Grasping at straws here...

(As an aside, I don't see the big optimization potential in read_f = f.read :))

0

精彩评论

暂无评论...
验证码 换一张
取 消