开发者

Multi-threaded performance std::string

开发者 https://www.devze.com 2023-02-07 15:13 出处:网络
We are running some code on a project that uses OpenMP and I\'ve run into something strange. I\'ve included parts of some play code that demonstrates what I see.

We are running some code on a project that uses OpenMP and I've run into something strange. I've included parts of some play code that demonstrates what I see.

The tests compare calling a function with a const char* argument with a std::string argument in a multi-threaded loop. The functions essentially do nothing and so have no overhead.

What I do see is a major difference in the time it takes to complete the loops. For the const char* version doing 100,000,000 iterations the code takes 0.075 seconds to complete compared with 5.08 seconds for the std::string version. These tests were done on Ubuntu-10.04-x64 with gcc-4.4.

My question is basically whether this is solely due the dynamic allocation of std::string and why in this case that can't be optimized away since it is const and can't change?

Code below and many thanks for your responses.

Compiled with: g++ -Wall -Wextra -O3 -fopenmp string_args.cpp -o string_args

#include <iostream>
#include <map>
#include <string>
#include <stdint.h>

// For wall time
#ifdef _WIN32
#include <time.h>
#else
#include <sys/time.h>
#endif

namespace
{
  const int64_t g_max_iter = 100000000;
  std::map<const char*, int> g_charIndex = std::map<const char*,int>();
  std::map<std::string, int> g_strIndex = std::map<std::string,int>();

  class Timer
  {
  public:
    Timer()
    {
    #ifdef _WIN32
      m_start = clock();
    #else /* linux & mac */
      gettimeofday(&m_start,0);
    #endif
    }

    float elapsed()
    {
    #ifdef _WIN32
      clock_t now = clock();
      const float retval = float(now - m_start)/CLOCKS_PER_SEC;
      m_start = now;
    #else /* linux & mac */
      timeval now;
      gettimeofday(&now,0);
      const float retval = float(now.tv_sec - m_start.tv_sec) + float((now.tv_usec - m_start.tv_usec)/1E6);
      m_start = now;
    #endif
      return retval;
    }

  private:
    // The type of this variable is different depending on the platform
#ifdef _WIN32
    clock_t
#else
    timeval
#endif
    m_start;   ///< The starting time (implementation dependent format)
  };

}

bool contains_char(const char * id)
{
  if( g_charIndex.empty() ) return false;
  return (g_charIndex.find(id) != g_charIndex.end());
}

bool contains_str(const std::string & name)
{
  if( g_strIndex.empty() ) return false;
  return (g_strIndex.find(name) != g_strIndex.end());
}

void do_serial_char()
{
  int found(0);
  Timer clock;
  for( int64_t i = 0; i < g_max_iter; ++i )
  {
    if( contains_char("pos") )
    {
     ++found;
    }
  }
  std::cout << "Loop time: " << clock.elapsed() << "\n";
  ++found;
}

void do_parallel_char()
{
  int found(0);
  Timer clock;
#pragma omp parallel for
  for( int64_t i = 0; i < g_max_iter; ++i )
  {
    if( contains_char("pos") )
    {
     ++found;
    }
  开发者_开发百科}
  std::cout << "Loop time: " << clock.elapsed() << "\n";
  ++found;
}

void do_serial_str()
{
  int found(0);
  Timer clock;
  for( int64_t i = 0; i < g_max_iter; ++i )
  {
    if( contains_str("pos") )
    {
     ++found;
    }
  }
  std::cout << "Loop time: " << clock.elapsed() << "\n";
  ++found;
}

void do_parallel_str()
{
  int found(0);
  Timer clock;
#pragma omp parallel for
  for( int64_t i = 0; i < g_max_iter ; ++i )
  {
    if( contains_str("pos") )
    {
     ++found;
    }
  }
  std::cout << "Loop time: " << clock.elapsed() << "\n";
  ++found;
}

int main()
{
  std::cout << "Starting single-threaded loop using std::string\n";
  do_serial_str();
  std::cout << "\nStarting multi-threaded loop using std::string\n";
  do_parallel_str();

  std::cout << "\nStarting single-threaded loop using char *\n";
  do_serial_char();
  std::cout << "\nStarting multi-threaded loop using const char*\n";
  do_parallel_char();
  }


My question is basically whether this is solely due the dynamic allocation of std::string and why in this case that can't be optimized away since it is const and can't change?

Yes, it is due to the allocation and copying for std::string on every iteration.

A sufficiently smart compiler could potentially optimize this, but it is unlikely to happen with current optimizers. Instead, you can hoist the string yourself:

void do_parallel_str()
{
  int found(0);
  Timer clock;
  std::string const str = "pos";  // you can even make it static, if desired
#pragma omp parallel for
  for( int64_t i = 0; i < g_max_iter; ++i )
  {
    if( contains_str(str) )
    {
      ++found;
    }
  }
  //clock.stop();  // Or use something to that affect, so you don't include
  // any of the below expression (such as outputing "Loop time: ") in the timing.
  std::cout << "Loop time: " << clock.elapsed() << "\n";
  ++found;
}


Does changing:

if( contains_str("pos") )

to:

static const std::string str = "pos";
if( str )

Change things much? My current best guess is that the implicit constructor call for std::string every loop would introduce a fair bit of overhead and optimising it away whilst possible is still a sufficiently hard problem I suspect.


std::string (in your case temporary) requires dynamic allocation, which is a very slow operation, compared to everything else in your loop. There are also old implementations of standard library that did COW, which also slow in multi-threaded environment. Having said that, there is no reason why compiler cannot optimize temporary string creation and optimize away the whole contains_str function call, unless you have some side effects there. Since you didn't provide implementation for that function, it's impossible to say if it could be completely optimized away.

0

精彩评论

暂无评论...
验证码 换一张
取 消