开发者

ADO.NET data table vs. data reader

开发者 https://www.devze.com 2022-12-22 08:01 出处:网络
The DataReader is more efficient than a DataTable if you only need to show data but not manipulate it. However, to get a DataReader from the data acc开发者_运维百科ess layer should I leave the connect

The DataReader is more efficient than a DataTable if you only need to show data but not manipulate it. However, to get a DataReader from the data acc开发者_运维百科ess layer should I leave the connection object open? I think this is also a very big efficiency problem. So is there another approach to this in order to take full advantage of the DataReader?


Yes, the data reader is definitely the most efficient - but you do not want to keep a connection open for a long period of time!

  • use the DataReader to read your data into an entity object; open the connection, read the data, close the connection
  • do whatever you need to do with your business object
  • store the changes back, e.g. by using an ad-hoc SQL query, a stored procedure, or whatever else you want; again: open the connection, write back the changes, close the connection

This is probably the most efficient you can get - it's a bit of work, some boring code, and all, but it's about as fast as it can be.

If you're more interested in developer productivity than raw speed, why not use some kind of an ORM to do all this boring, annoying mapping back and forth? Saves you a lot of coding and messy stuff to maintain!


I never let a DataReader out into the wild (out of the DAL). Only a matter of time before you are leaving connections open somewhere. Also, I’m almost never dealing with so much data on a single call where passing around a DataTable or DataSet presents a problem.

We are using an object oriented language and the DAL can really take advantage of this. There should only be one line of code in your project that gets the connection string. Only one object that actually touches the database (calls ExecuteNonQuery, DA.Fill(), etc.)

This also lets you get pretty involved with logging exceptions, etc. because you are only doing it once. So in the one DAL base class that I use for all of my DAL object in all of my project I have the logic that if the DAL throws an exception then it is logged to a table in my database. This logging fails-over to a text file if the database logging fails.

So the code I see a lot looks like:
-   Start a try block
-   Make a SQLCommand
-   Get connection string.
-   Make Connection object
-   Open the connection
-   Get the data
-   Bind the data
-   Close the connection
-   Log error if exception

Since I encapsulate all of this, my code to get the data now looks like:

GridView1.DataSource = cProgram.DB.getMyData();

(or more likely a BAL object sits between the 2). The DB is a normal object (not static), but it only instantiated once per application.


Let your data layer return objects, not datatables or data readers. I would suggest you populate your objects using a data reader.


What I usually do is open the reader with CommandBehavior.CloseConnection. Then I run through the reader and read the data into my own object model or list or whatever in memory with the data and then close the reader. It does much of the same stuff as a data table, but I just hate dealing with bloated and loosely-typed data stuctures.


Let us have a silly benchmark to check how much faster the DataReader (.Net version 4) is. I fetched one record from the database (SQL Server 2000) and read all its fields. I repeated this process 1000 times. DataReader took 17.3327585 seconds and DataTable took 18.37320156 and so DataReader is ~1.04 seconds faster than DataTable for 1000 reads.

So, one would get a performance gain of 0.00104 seconds if DataReader is preferred over DataTable.

Look at Is DataSet slower than DataReader due to…? as well


When I researched this before, I believe I discovered that the performance difference between DataReader and DataTable was trivial, except for perhaps very large amounts of data. Since then, I have typically used DataTable as it's more full-featured, can be worked with disconnected, etc.


If you want to completely abstract the connections and ceremony of ADO.NET, the DataReader is a small challenge. I really don't like my data tool having an open connection on the loose, hoping the DataReader is disposed (assuming you used option CommandBehavior.CloseConnection). Also, when using a lot of DataReaders, it's difficult to pool connections, since you can't do anything with the connection until the previous DataReader is closed. They can't be passed around easily. Your Data-tool is not a true abstraction.

DataTables on the other hand are extremely flexible and can make for some very efficient, clear code. Linq-To-DataTable is great. Luckily the DataTable is actually pretty efficient. For non-huge result-sets it is almost as fast as the datareader. (it depends of course on exactly what you're doing.) More and more I am getting DataTables from my data-tool rather than Readers. It really makes life simple. I can continue to use the same open connection. There's no "state" in data-tool.

The code for getting a DataReader is very simple. So when I really do need a DataReader (not often), I just let my DAL hand me my connection and I get my DataReader myself.


Straight from the documentation:

When you need to retrieve multiple rows of data so that you can display or process the data in some other way, you have two basic choices. You can use a DataSet object or a DataReader object.

The DataReader approach is generally quicker because it avoids the overhead that is associated with creating a DataSet object. The overhead that is associated with a DataSet object includes creating DataSet subobjects such as DataTables, DataRows, and DataColumns. However, the DataReader provides less flexibility, and is less suited to situations where you have to cache data and pass the data to components in an application that has multiple tiers.

Note: The DataAdapter used to fill the DataSet uses a DataReader internally.

Use a DataReader when the following conditions are true:

• You need forward-only, read-only access to data (the fire hose scenario), and you want to access the data as quickly as possible, and you do not need to cache it.

• You have a data container such as a business component that you can put the data in.

Use a DataSet when the following conditions are true:

• You have to cache or pass the data between layers.

• You require an in-memory relational view of the data for XML or non-XML manipulation.

• You want to update some or all the retrieved rows, and you want to use the batch update facilities of the SqlDataAdapter class.

• You have to bind data to a control type that the DataReader cannot be bound to. Many Windows Forms controls capable of data binding require a data source that implements the IList interface. The DataSet implements IList, but the DataReader implements IEnumerable. IEnumerable supports data binding to most Web Form controls but not to certain Windows Forms controls. Check the data source requirements for the particular control type that you want to bind.

• You have to access multiple sets of data at the same time, and you do not want to hold open server resources.

Though talking about DataSet basically most of it holds for DataTable too. From an efficiency standpoint, here is rare benchmarking from msdn itself. The bottom line is DataReader is marginally faster, and if it matters..

See this related question too which suggests some cool ORMs and benchmarking.


This is the same as I posted here.

I did some benchmarking myself with various approaches:

public DataTable Read1(string query)
{
    using (var cmd = conn.CreateCommand())
    {
        cmd.CommandText = query;
        cmd.Connection.Open();
        var table = new DataTable();
        using (var r = cmd.ExecuteReader())
            table.Load(r);
        return table;
    }
}

public DataTable Read2<S>(string query) where S : IDbDataAdapter, IDisposable, new()
{
    using (var da = new S())
    {
        using (da.SelectCommand = conn.CreateCommand())
        {
            da.SelectCommand.CommandText = query;
            da.SelectCommand.Connection.Open();
            DataSet ds = new DataSet();
            da.Fill(ds);
            return ds.Tables[0];
        }
    }
}

public IEnumerable<S> Read3<S>(string query, Func<IDataRecord, S> selector)
{
    using (var cmd = conn.CreateCommand())
    {
        cmd.CommandText = query;
        cmd.Connection.Open();
        using (var r = cmd.ExecuteReader())
            while (r.Read())
                yield return selector(r);
    }
}

public S[] Read4<S>(string query, Func<IDataRecord, S> selector)
{
    using (var cmd = conn.CreateCommand())
    {
        cmd.CommandText = query;
        cmd.Connection.Open();
        using (var r = cmd.ExecuteReader())
            return ((DbDataReader)r).Cast<IDataRecord>().Select(selector).ToArray();
    }
}

public List<S> Read5<S>(string query, Func<IDataRecord, S> selector)
{
    using (var cmd = conn.CreateCommand())
    {
        cmd.CommandText = query;
        cmd.Connection.Open(); 
        using (var r = cmd.ExecuteReader())
        {
            var items = new List<S>();
            while (r.Read())
                items.Add(selector(r));
            return items;
        }
    }
}

1 and 2 returns DataTable while the rest strongly typed result set, so its exactly not apples to apples, but I while time them accordingly.

Just the essentials:

Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < 100; i++)
{
    Read1(query); // ~8900 - 9200ms

    Read1(query).Rows.Cast<DataRow>().Select(selector).ToArray(); // ~9000 - 9400ms

    Read2<MySqlDataAdapter>(query); // ~1750 - 2000ms

    Read2<MySqlDataAdapter>(query).Rows.Cast<DataRow>().Select(selector).ToArray(); // ~1850 - 2000ms

    Read3(query, selector).ToArray(); // ~1550 - 1750ms

    Read4(query, selector); // ~1550 - 1700ms

    Read5(query, selector); // ~1550 - 1650ms
}

sw.Stop();
MessageBox.Show(sw.Elapsed.TotalMilliseconds.ToString());

The query returned about 1200 rows and 5 fields (run for 100 times). Apart from Read1 all performed well. Of all I prefer Read3 which returns data lazily, as enumerated. This is great for memory if you only need to enumerate it. To have a copy of the collection in memory, you're better off with Read4 or Read5 as you please.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号