Recently I've had to increase the volume of an data extraction script and I've found the output to be missing some data that is present in the source tables. Can anyone suggest some ideas for what might be causing this issue?
Background
The extraction procedure is an sql script which I execute from an sqlplus client (running in Windows Server 2008) and which spools the query results to output files.
The data that is extracted includes customer records and their transaction records. Each customer may have several hundred transaction records each month. Before the increased load, we were extracting about 100K customers with the last 3 months of transactions. This has now increased to 1 million. I expect that the output file would be about 30GB.
The transactions table is partitioned into monthly partitions. The extraction script includes a separate query spooling each partition. It seemed that the first of the 3 month's transactions was output, and the 2nd and 3rd month of transactions was missing (though the process log showed that it still spent the time querying all partitions).
Some interesting progress
This sql script is invoked within a perl script. Originally the sql script was invoked in one go. I decided to split up the sql and to invoke each query in turns and the output seems to have tripled in size (to about 30GB). Why might this have happened? (This may help with the real problem below.)
The current issue
There is a second database with 5x more customers and 2 types 开发者_如何学Cof transaction tables, each partitioned into daily partitions. So to extract 3 months of transactions, the script queries on 90 partitions on 2 separate tables. Again, the 1st month of data seems to have output successfully, but the 2nd and 3rd months are missing. The above solution did not work for this database.
Something else that is unusual is that the 2 files output from the 2 transaction tables are both almost exactly 4GB in size (4,294,967,362 bytes). Could it be that Oracle or something else is limiting the extraction size?
What version of windows and what is the filesystem on your disk? From memory the FAT32 filesystem has a filesize limit. Think you need NTFS for over 2G (or maybe thats 4G). I suspect that you are hitting that limit, either in windows or sqlplus itself.
You could try narrowing it down by trying
sqlplus > file.txt
as opposed to using the spool
command. If that works it looks like a sqlplus issue. If it doesn't then more likely a filesystem issue.
精彩评论