Improving the performance of System.String to std::wstring conversions?_问答_开发者

I'm currently evaluating the use of ADO.NET for a C++ application that currently uses plain old ADO. Given that we're redoing the whole database interaction, we'd like to determine if using the more modern and actively developed technology of ADO.NET would be beneficial.

After some measurements it appears that for certain test queries 开发者_C百科that retrieve a lot of rows with few columns that all contain strings, ADO.NET is actually about 20% slower for us than using plain ADO. Our profiler suggests that the conversion of System.String results into the std::wstring used by the application is one of the bottlenecks. I can't switch any of the upper layers of the application to using System.String, so we are stuck with this particular conversion.

A rough outline of the code looks like this:

System::Data::SqlClient::SqlCommand^ sqlCmd =
  gcnew System::Data::SqlClient::SqlCommand(cmd, m_DBConnection.get());
System::Data::SqlClient::SqlDataReader^ reader = sqlCmd->ExecuteReader();
if (reader->HasRows)
{
    using namespace msclr::interop;
    while (reader->Read())
    {
      std::vector<std::wstring> results;
      for (int i=0; i < reader->FieldCount; ++i)
      {
        std::wstring col_data;
        TypeCode type = Type::GetTypeCode(reader->GetFieldType(i));
        switch (type)
        {
           // ... omit lots of different types
        case TypeCode::String:
          {
            System::String^ tmp = reader->GetString(i);
            col_data = marshal_as<std::wstring>(tmp);
          }
          break;
          // ... more type conversion code removed
        }
        results.push_back(col_data);
      }
      // NOTE: Callback into native result processing code
      ResultsCallback(results);
    }

I've spent a lot of time reading up on the various ways of getting a std::wstring out of the System.String and measured most of them. They all seem to perform roughly similar - we're talking decimal points in the percentage of CPU usage. In the end I simply settled for using marshal_as<std::wstring> as it's the most readable and appears to be as performant as the other solutions (ie, using PtrToStringChars or the method described in MSDN here).

Using the DataReader works very well from a conceptual point of view as most of the processing we do on the data is row oriented anyway.

The only other slightly unexpected bottleneck I noticed is the retrieval of the TypeCode for the results columns; I'm already planning to move that outside the main results processing loop and only retrieve the type codes once per query result.

After this lengthy introduction, can anybody recommend a less costly way to convert the string data from a System.String to a std::wstring or am I already looking at the optimum performance here? I'm obviously more looking for slightly out of the ordinary ways given that I've already tried all the ordinary ones...

EDIT: Looks like I fell into a trap of my own making here. Yes, the code above is about 20% slower than the equivalent plain ADO code in Debug mode. However switching it into Release mode, the bottleneck is still measurable but the ADO.NET code above is suddenly almost 50% faster than the older ADO code. So while I'm still concerned a little about the cost of the string conversion, it's not as big in Release mode as it first appeared.

I don't see there being any way to optimize that, since the implementation of marshal_as<std::wstring> just grabs the internal C string and assigns it to an std::wstring. You can't get much more efficient than that.

The only solution I can see is splitting up your rows and having N threads process them in parallel. The only issue is that you would need to reserve enough space in your vector to prevent a resize from taking place during processing, but that looks easy enough.

If you're using Visual Studio 2010, I think the C++0x threading library would be sufficient for this task, though I'm not sure how much (if any) is implemented in Visual Studio so far.