开发者

the way to compare string in an efficient way in C++

开发者 https://www.devze.com 2023-01-24 15:22 出处:网络
Is it efficient to compare a string with another string or string literal like t开发者_开发百科his?

Is it efficient to compare a string with another string or string literal like t开发者_开发百科his?

string a;
string b;
if (a == "test")

or

if (a == b)

My coworker asked me to use memcmp

Any comments about this?

Thanks.


Yes use a == b, do not listen to your co-worker.

You should always prefer code readability and using STL over using C functions unless you have a specific bottleneck in your program that you need to optimize and you have proven that it is truly a bottleneck.


Obviously you should use a == b and rely on its implementation.

For the record, std::char_traits<char>::compare() in a popular implementation relies on memcmp(), so calling it directly would only be more painful and error-prone.


If you really need to know, you should write a test-application and see what the timing is.

That being said, you should rely on the provided implementation being quite efficient. It usually is.


I think your coworker is a bit hooked up on possible optimization.

  • memcmp isn't intended to compare strings (that would be strcmp)
  • to only compare upto the size of the shortest string, you would need strlen on both strings
  • memcmp returns <0, =0, >0, which is a nuisance to always remember
  • strcmp and strlen can cause weird behaviour with bad c-style strings (not ending with \0 or null)


It's less efficient. std::string::operator== can do one very quick check, for equal length. If the sting lengths aren't equal (quite common), it can return false without looking at even one character.

In C, memcmp must be told the length to compare, which means you need to call strlen twice, and that looks at all characters in both strings.


STL best practice is to always prefer member functions to perform a given task. In this case that's basic_string::operator==.

Your coworker needs to think a bit more in C++ and get away from the CRT. Sometimes I think this is just caused by fear of the unknown - if you can educate on C++ options, perhaps you will have an easier time.


Only If Speed is Very Important

Use strings of fixed size (32-64 bytes is very good), initialized to all zeros and then filled with string data. (Note that here, by "string" I mean raw C code or your own custom string class, not the std::string class.)

Use memcpy and memcmp to compare these strings always using the fixed buffer size.

You can get even faster than memcmp if you make sure your string buffers are 16-byte aligned so you can use SSE2 and you only need to test for equality and not greater or less-than. Even without SSE2 you can do an equality compare using subtraction in word-sized chunks.

The reason that these techniques speed things up is that they remove the byte-by-byte comparison test from the equation. Looking for the terminating '\0' or the byte that is different is expensive because test-and-branch is hard to predict and pipeline.


Maybe or maybe not

If your C++ implementation uses a highly optimized memcmp (as GCC has) and it's C++ string comparison does the trivial while(*p++ == *q++) ... equivalent, then, yes, memcmp would be faster on large strings because it utilizes multiple character comparisons at a time and aligned 32bit loads.

On shorter strings, these optimizations wouldn't be visible in the timings - but on larger strings (some 10K or so), the speedup should be clearly visible.

The Answer: it depends ;-) Check your C++ strings implementation.

Regards

rbo

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号