开发者

How to optimize the runtime for datatable to collection transformation?

开发者 https://www.devze.com 2023-02-26 18:57 出处:网络
I have a database to which i have to connect through odbc. The data fetch takes app. 2 minutes. and the resulting DataTable has 350000 records.

I have a database to which i have to connect through odbc.

The data fetch takes app. 2 minutes. and the resulting DataTable has 350000 records.

I am trying to transform the data table into this object graph. The resultset has no primary key, the primary key is specified through the view from which i fetch data.

public class PriceCurve
{
    public PriceCurve(DataTable dt)
    {
        this.Id = int.Parse(dt.AsEnumerable().First()["ID"].ToString());
        this.Prices = new List<Price>();
        GetPrices(dt);
    }

    public int Id { get; private set; }
    public IList<Price> Prices { get; set; }

    private void GetPrices(DataTable dt)
    {
        foreach (DataColumn column in dt.Columns)
        {
            switch (this.GetPriceProviderType(column)) // parses ColumnName to Enum
            {
                case Price.PriceProvider.A:
                    {
                        this.Prices.Add(new Price(Price.PriceProvider.A, dt.AsEnumerable()));
                    }

                    break;
                case Price.PriceProvider.B:
                    {
                        this.Prices.Add(new Price(Price.PriceProvider.B, dt.AsEnumerable()));
                    }

                    break;
            }
        }

    public class Price
    {
        public enum PriceProvider
        {
            A, B
        }

        public Price(PriceProvider type, IEnumerable<DataRow> dt)
        {
            this.Type = type;
            this.TradingDates = new List<TradingDate>();
            this.GetTradingDates(type, dt);
        }

        public IList<TradingDate> TradingDates { get; set; }
        public PriceProvider Type { get; set; }

        private void GetTradingDates(PriceProvider type, IEnumerable<DataRow> dt)
        {
            var data = dt.Select(column => column["TRADING_DATE"]).Distinct();

            foreach (var date in data)
            {
                this.TradingDates.Add(new TradingDate(date.ToString(), type, dt));
            }
        }

        public class TradingDate
        {
            public TradingDate(string id, PriceProvider type, IEnumerable<DataRow> dt)
            {
                this.Id = id;
                this.DeliveryPeriodValues = new Dictionary<int, double?>();
                this.GetDeliveryPeriodValues(type, dt);
            }

            public string Id { get; set; }
            public IDictionary<int, double?> DeliveryPeriodValues { get; set; }

            private void GetDeliveryPeriodValues(PriceProvider type, IEnumerable<DataRow> dt)
            {

                foreach (var row in dt.Where(column => column["TRADING_DATE"].ToString() == this.Name))
                {
                    try
                    {
                        this.DeliveryPeriodValues.Add(
                            int.Parse(row["DELIVERY_PERIOD"].ToString()),
                            double.Parse(row[Enum.GetName(typeof(Price.PriceProvider), type)].ToString()));
                    }
                    catch (FormatException e)
                    {
                        this.DeliveryPeriodValues.Add(
                            int.Parse(row["DELIVERY_PERIOD"].ToString()),
                            null);
                    }
                }    
            }
        }
    }

i create one object, which contains a list with two objects. Each of these two objects contains a list with 1000 objects. Each of these 1000 objects contains a dictionary with 350 pairs.

It either crashes visual studio 2010 during debug, fails开发者_如何学运维 because of OutOfMemory or takes minutes (unacceptable) to execute.

What is the best approach to this problem. i am new to c# and do not know how to optimize the looping through this huge data or my object graph. Any help is appreciated.


It either crashes visual studio 2010 during debug, fails because of OutOfMemory or takes minutes (unacceptable) to execute.

YOu made me laugh. Really.

  • 350.000 nodes is challenging on a 32 bit machine with .NET. Add some overhead and you are dead. Use objects, not adata table which is VERY memory destroying.

  • takes minutes is pretty much your decision / programming. Use a list of objects, not a data table. Use a profiler. DOnt make beginner mistakesl ike:

var data = dt.Select(column => column["TRADING_DATE"]).Distinct();

No need for that, deal with doubles later inthe code. Distinct is expensive. Profile it.

foreach (var row in dt.Where(column => column["TRADING_DATE"].ToString() == this.Name))

That is 350.000 row lookups by name to get the index of the column, compared by a lot of tostring.

Get a profiler and find out where you exactly spend your time. Get please rid of the table and use objects - DataTable is a memory hog and SLOW compared to a list of objects. And yes, it will take minutes. Main reasons:

  • Your programming. Not a shame. Just learn, Go objets / structs NOW.
  • ODBC. Takes time to just load the data, especially as you dont process swhile loading (DataReader) but wait for allto ahve loaded, and ODBC is NOT fast. 350.000 rows, good network, direct SQL Server is maybe 30 seconds - same machine less.
0

精彩评论

暂无评论...
验证码 换一张
取 消