Fastest method for SQL Server inserts, updates, selects_问答_开发者

I use SPs and this isn't an SP vs code-behind "Build your SQL command" question. I'm looking for a high-throughput method for a backend app that handles many small transactions. I 开发者_Go百科use SQLDataReader for most of the returns since forward only works in most cases for me.

I've seen it done many ways, and used most of them myself.

Methods that define and accept the stored procedure parameters as parameters themselves and build using cmd.Parameters.Add (with or without specifying the DB value type and/or length)
Assembling your SP params and their values into an array or hashtable, then passing to a more abstract method that parses the collection and then runs cmd.Parameters.Add
Classes that represent tables, initializing the class upon need, setting the public properties that represent the table fields, and calling methods like Save, Load, etc

I'm sure there are others I've seen but can't think of at the moment as well. I'm open to all suggestions.

This answer focuses mainly on 'select' vs update/create/delete operations. I think it's rarer to update more than one or a few records at a time, and so I also think 'select' is where the bottlenecks tend to occur. That said, you need to know your application (profile). The best place to focus your optimization time is almost always at the database level in the queries themselves, rather than the client code. The client code is all just the plumbing: it's not the main force of your app. However, as plumbing tends to be re-used in many different apps, I do sympathize with the desire to get it as close to optimal as possible, and therefore I do have plenty to say on how to build that code.

I have a generic method for select queries/procedures in my data layer that looks something like this:

private static IEnumerable<IDataRecord> Retrieve(string sql, Action<SqlParameterCollection> addParameters)
{
    //ConnectionString is a private static property in the data layer
    // You can implement it to read from a config file or elsewhere
    using (var cn = new SqlConnection(ConnectionString))
    using (var cmd = new SqlCommand(sql, cn))
    {
        addParameters(cmd.Parameters);

        cn.Open();
        using (var rdr = cmd.ExecuteReader())
        {
            while (rdr.Read())
                yield return rdr;
            rdr.Close();
        }
    }
}

And that lets me write public data layer methods that use anonymous methods to add the parameters. The code shown works with .Net 2.0+, but can be written even shorter using .Net 3.5:

public IEnumerable<IDataRecord> GetFooChildrenByParentID(int ParentID)
{
    //I could easily use a stored procedure name instead of a full sql query
    return Retrieve(
        @"SELECT c.* 
         FROM [ParentTable] p 
         INNER JOIN [ChildTable] c ON c.ParentID = f.ID 
         WHERE f.ID= @ParentID", delegate(SqlParameterCollection p)
       {
          p.Add("@ParentID", SqlDbType.Int).Value = ParentID;
       }
     );
}

I'm gonna stop right here so I can point you again to the code just above that uses the anonymous method for parameter creation.

This is very clean code, in that it puts the query definition and parameter creation in the same place while still allowing you to abstract away the boilerplate database connection/calling code to somewhere more re-usable. I don't think this technique is covered by any of the bullet points in your question, and it happens to be pretty darn fast as well. I think this about covers the thrust of your question.

I want to continue, though, to explain how this all fits together. The rest is fairly straightforward, but it's also easy to throw this to a list or similar and get things wrong, ultimately hurting performance. So moving on, the business layer then uses a factory to translate query results to objects (c# 3.0 or later):

public class Foo
{
    //various normal properties and methods go here

    public static Foo FooFactory(IDataRecord record)
    {
        return new Foo
        {
            Property1 = record[0],
            Property2 = record[1]
            //...
        };
    }
}

Rather than having these live in their class, you could also group them all together into a static class specifically intended to hold the factory methods.

I need to make one change to the original retrieve method. That method "yields" the same object over and over, and this doesn't always work that well. What we want to do differently to make it work is to force a copy of the object represented by the current record, so that when the reader mutates for the next record we're working with clean data. I waited until after showing the factory method so we can use that in the final code. The new Retrieve method looks like this:

private static IEnumerable<T> Retrieve(Func<IDataRecord, T> factory,
                  string sql, Action<SqlParameterCollection> addParameters)
{
    //ConnectionString is a private static property in the data layer
    // You can implement it to read from a config file or elsewhere
    using (var cn = new SqlConnection(ConnectionString))
    using (var cmd = new SqlCommand(sql, cn))
    {
        addParameters(cmd.Parameters);

        cn.Open();
        using (var rdr = cmd.ExecuteReader())
        {
            while (rdr.Read())
                yield return factory(rdr);
            rdr.Close();
        }
    }
}

And now we would call that new Retrieve() method like this:

public IEnumerable<Foo> GetFooChildrenByParentID(int ParentID)
{
    //I could easily use a stored procedure name instead of a full sql query
    return Retrieve(Foo.FooFactory,
        @"SELECT c.* 
         FROM [ParentTable] p 
         INNER JOIN [ChildTable] c ON c.ParentID = f.ID 
         WHERE f.ID= @ParentID", delegate(SqlParameterCollection p)
       {
          p.Add("@ParentID", SqlDbType.Int).Value = ParentID;
       }
     );
}

Obviously this last method can be expanded to include any additional business logic needed. It also turns out this code is exceptionally fast, because it takes advantage of the lazy evaluation features of IEnumerable. The downside is that it tends to create a lot of short-lived objects, and that can hurt the transactional performance you asked about. To get around this I sometimes break good n-tier and pass the IDataRecord objects directly to the presentation tier and avoid unnecessary object creation for records that are simply bound to a grid control right away.

Update/Create code is similar, with the difference that you're usually only changing one record at a time rather than many.

Or, I could save you reading this long post and just tell you to use Entity Framework ;)

Personally I am a big fan of code generation. I roll my own homebrew XML, and at build time I run it through an XSLT to generate my .CS files. I describe the process in this post Using XSLT to generate Performance Counters code. Although the link discusses generating performance counters code, I use the same process to generate my DAL.

So I would create an XML like:

<query name="LoadCustomerByName" returns="Customer">
  <parameter name="name" type="String"/>
  <text>SELECT ... FROM Customers WHERE name=@name</text>
</query>

and then the XLST would transform this into something like:

public Customer LoadCustomerByName(
  SqlConnection conn,
  SqlTransaction trn,
  String name)
{
  using (Sqlcommand cmd = new SqlCommand(@"SELECT ... FROM ...", conn, trn))
  {
    cmd.Parameters.AddWithValue("@name", name);
    using (SqlDataReader rdr = cmd.ExecuteReader ())
    {
      Customer c = new Customer();
      // Load c from rdr
      return c;
    }
  }
}

Now I let out a lot of details of what the XSLT transformation actually does, but the really important thing is that this method gives me absolute control over how I create my DAL and it is flexible in every aspect, since the generated .CS code is entirely driven by my XSLTs. I can change the XLST and this will result in re-generation of every single method in my DAL. It makes for easy exploration of various solutions, it allows me to add instrumentation to the code (like counters to measure each individual query performance and use frequency) and many more.

This is what basically the various VS designers do for you, but if you take the extra step to control the code generation process, you have a lot more flexibility and control over the result.

Fastest for execution time or fastest for programming time? The only thing you could do to increase throughput on #1 is to use multiple threads and connections to do the inserts - you can do this with SQLCommand.BeginExecuteNonQuery

This one has been around a while but if you're open to using a very Micro-ORM, just use dapper-dot-net.

It has very clean syntax, is extremely fast, and easy to drop into any project. It is being used in production at StackOverflow, and this page was likely brought to you by it. Has support for all the usual SQL methods as extension methods and supports async on everything.

Some performance comparisons:

Fetch performance of various .NET ORM data access frameworks
Speed Comparison: Dapper vs EF
Codefluent entities performance comparison

This is not about Insert or Update, but I did some benchmarking for Read speeds with various approaches. Of all, DataTable routes seemed slower of the bunch.. Joel's approach is basically the fastest you can get..

The one thing I don't like is that yield return cannor reside within a try...catch block. Therefore, centralized exception handling/logging cannot be accommodated.

I have used a similar appraoch, but pass an IEnumerable as a parameter. Then I don't have to use yield return.