I have some massive (4.6 million lines) data files that I'm trying to edit with fortran. Basically, throughout the files is a series of headers followed by a table of numbers. Something like this:
p he4 blah 99 ggg 1.0e+01 2.0e+01 2.0e+01 2.0e+01 5.0e+01 2.0e+01 . . 3.2e+-1 2.0e+01 1.0e+00 p he3 blafoo 99 ggg 1.1e+00 2.3e+01 2.0e+01My task is to replace certain entries in on开发者_如何转开发e file with those from the other. The list is supplied separately.
I have written a code that already works. My strategy is to just read and echo the first file until I find a header that matches the replacement list. Then find the same header in the second file, echo the entries. Finally, switch back to echoing the first file. The only problem with this approach is that it's SOOOOOO slow! I looked into direct access of the files, but they don't have fixed record lengths. Does anyone have a better idea?
Cheers for the help, Rich
Are the headers in the files sorted in any way? If not then creating an index file of the headers in the second file should speed up the first lookup. My fortran is very rusty, but if you can sort the headers in the second file into an index file with a reference to the position of the full entry you should be able to speed things up dramatically?
I am assuming that you are reading file 1, and writing the results to file 3. File 2 contains the replacements.
Preprocess file 2, by loading each header, and using a hash algorithm to create
an array with and integer hash representation of each header value in it, and a
pointer/subscript to the values to replace it by.
while there are lines left in file 1
read an original line from file 1
hash the original line to get the hash value.
if the hash value is in the hash array
write the replacement to file 3
else
write the original line to file 3
That ought to do the trick.
精彩评论