I have a table with large number of rows(~200 million) and I want to process these values in c#, after reading them from memory. Processing requires grouping entries by column values in a way that can't be done inside the sql server itself. Problem is that reading the whole data at once gives me a OutOfMemory exception, and takes a lot of time to execute even partially.
So I want to break my query into shorter pieces. One method is to obviously do an independent select and then use the where in clause. Another method that I have been suggested is to use sql cursors. I want to chose one of these methods(or another one if possible), especially with regards to the following points:
- What would be the performance impact of the schemes on the server? Which would perform faster?
- Can I safely parallelize the sql cursor queries? Would I get a performance benefit if I parallelize the first 开发者_Python百科scheme(one with where in clause)?
- How many objects can I specify in where in clause? Is it only limited by the size of the query string?
Any other suggestions are also welcome.
Edit1: I have been given different solutions, but I would still like to know the answers to my original questions(out of curiousity).
If you have to do the grouping logic in code, you can try to write the logic as a Managed Stored Procedure in sql server which can be used in the groping query.
Check out
- How to: Create and Run a SQL Server Stored Procedure by using Common Language Run-time Integration
- How to: Create and Run a SQL Server User-Defined Function by using Common Language Run-time Integration
This will allow you to group on the server before returning the dataset to your client.
[Edit - regarding your comments on using Dictionaries]
You can check out my project on Codeplex which has a disk persisting Dictionary<T,V>
. This would prevent the out of memory exception. Would be interesting to see how it performs for your scenario. (If you are on a 32bit system, read the note on the intro page).
If you are using sql 2005 or higher you should check out sql based paging.
http://blogs.x2line.com/al/archive/2005/11/18/1323.aspx
It should work for what you are trying to do and is a better option than the two you listed.
精彩评论