开发者

Is searching a HashTable slower when the key are strings and the strings contain spaces

开发者 https://www.devze.com 2022-12-11 10:32 出处:网络
Today I was discussing with another developer about a limitation in a third party library, where we couldn\'t use spaces in a string. The reasoning was that 开发者_JAVA技巧the strings were used as key

Today I was discussing with another developer about a limitation in a third party library, where we couldn't use spaces in a string. The reasoning was that 开发者_JAVA技巧the strings were used as keys in a .NET Hashtable, and that searching a .NET HashTable was significantly slower when the keys contained spaces.

Now since I'm too lazy to write a test, but I still want to understand why this would be so, I ask my question here:

Is it slower to search a Hashtable when the string that is used contains a space?

I would not expect this, since before the search is performed, the hash is obtained using String.GetHashCode() and then that hash is used to locate the entry in the table.

Thanks!


Straight from the Rotor source, the core of the String.GetHashcode method:

                int     c;
                char *s = src;
                while ((c = s[0]) != 0) {
                    hash1 = ((hash1 << 5) + hash1) ^ c;
                    c = s[1];
                    if (c == 0)
                        break;
                    hash2 = ((hash2 << 5) + hash2) ^ c;
                    s += 2;
                }

What I can make up of this: spaces do not get any special treatment.

Conclusion:

  • The third party does not use HashTable or wraps something around to string to make spaces slower.
  • Or they are trying to obfuscate their implementation by telling stories.


It shouldn't be slower. It uses GetHashCode() internally so set of characters in string doesn't matter.

This said, performance is dependent only on GetHashCode implementation for String. You may get different results for different framework versions (from MSDN):

The behavior of GetHashCode is dependent on its implementation, which might change from one version of the common language runtime to another. A reason why this might happen is to improve the performance of GetHashCode.


White space increases the length of the string slowing the hash function but I expect this to be really insignificant. On the other side, leaving the white spaces in the string can lead to a better hash with less collisions. So I don't think there's any problem using a string with spaces in a HashTable.

0

精彩评论

暂无评论...
验证码 换一张
取 消