We are developing one anayltical tool of stock moddeling in .Net.
The primary objective of the tool is to Run a Model for 5 years and do projections for future In, Out and Stock for various Products.
The primary workflow of the code is 1. Fetch the data from database. 2. For each date Process Data (Run Production and Stock Model) 3. After All the dates are traversed update all the data together in database.
So primarily there are only two database calls and initially we take all the data in datasets and then we process it in Ram and do not make database calls.
The problem we faced that it was taking almost an hour to run a model for 1 year. Our benchmark is to run the model for 5 years in 5 minutes.
We have been working on this problem for almost a month now. Right now we have been able to achieve running model for 1 year in 10 mins. Following are the things that we have found out. - While fetching data from the data set if the tables carrry all five years data it was difficult to fecth so we divided the data sets in monthly loops and now we run model for a month at a time. This has given us the maximum improvement in speed. - Tried to reduce for loops inside the model which runs daily. This did not give us much improvement.
You can download one rar file from following link. http://dl.dropbox.com/u/4546390/iPlanner.rar
It contains three file.
iPlanner Tables.xls : which is giving idea of database design. iPlanner Logic.xls : talks about table and the logic of production model, shipment model and actual value handling. I think the most important is to look at the production model, this will give you a brief idea of what the model does daily.
Common.cs : which has the Call Production Model function from where everything starts. You can check that out too.
The model was previously written in excel in excel it use开发者_JAVA百科d to take 2 mins for 5 years. The reason to move to .Net is to have more sharing features and have a software like look.
I am trying to find out the ways in which this can be improved.
Let me know if more information is required on this.
Thanks in Advance
If the calculations done for each date are independent, this sounds like a good application of map/reduce. How much have you explored the idea of parallelizing this calculation? Sixty Hadoop processors, one for each month in the five-year window, could make short work of it.
First: profile ;p
The next thing I'd try is taking DataTable
out of the system, in favor of strongly typed classes that exactly match your data. Although data-load speed isn't the problem, I'd use something like dapper-dot-net to make loading the data as efficient as possible.
With a DataTable
, every member access is indirect, and has to come via an internal lookup, possibly involving boxing en-route. Cut all of that out by using static binding to the actual data properties (which are almost always inlined to the fields). Unfortuantely, this is a bit hard to measure as an estimated impact, as is non-trivial.
精彩评论