C# performance of static string[] contains() (slooooow) vs. == operator_问答_开发者

Just a quick query: I had a piece of code which compared a string against a long list of values, e.g.

if(str == "string1" || str == "string2" || str == "string3" || str == "string4".
     DoSomething();

And the interest of code clarity and maintainability I changed it to

public static string[] strValues = { "String1", "String2", "String3", "String4"};
...
if(strValues.Contains(str)
    DoSomethi开发者_StackOverflowng();

Only to find the code execution time went from 2.5secs to 6.8secs (executed ca. 200,000 times).

I certainly understand a slight performance trade off, but 300%?

Anyway I could define the static strings differently to enhance performance?

Cheers.

Fyi..

Using:

private static HashSet<string> strHashSet = new HashSet<string>() 
{ "0string", "1string", "2string", "3string", "4string", "5string", 
  "6string", "7string", "8string", "9string", "Astring", "Bstring" };

private static List<string> strList = strHashSet.ToList();
private static string[] strArray = strList.ToArray();
private static Dictionary<int, string> strHashDict = strHashSet.ToDictionary(h => h.GetHashCode());
private static Dictionary<string, string> strDict = strHashSet.ToDictionary(h => h);

// Only one test uses this method.
private static bool ExistsInList(string str)
{
  return strHashDict.ContainsKey(str.GetHashCode());
}

Checking for the first and last strings in the list then checking for a string not in the list: "xstring" Executing 500,000 iterations, all times in milliseconds.

1.A Test: result = (str == "0string" || str == "1string" ...
[storage var]          [first]:[ last ]:[ none ]:[average]
strArray                 3.78 :  45.90 :  57.77 :  35.82

2.A Test: ExistsInList(string);
[storage var]          [first]:[ last ]:[ none ]:[average]
none                    36.14 :  28.97 :  24.02 :  29.71

3.A Test: .ContainsKey(string.GetHashCode());
[storage var]          [first]:[ last ]:[ none ]:[average]
strHashDict             34.86 :  28.41 :  21.46 :  28.24

4.A Test: .ContainsKey(string);
[storage var]          [first]:[ last ]:[ none ]:[average]
strDict                 38.99 :  32.34 :  22.75 :  31.36

5.A Test: .Contains(string);
[storage var]          [first]:[ last ]:[ none ]:[average]
strHashSet              39.54 :  34.78 :  24.17 :  32.83
strList                 23.36 : 122.07 : 127.38 :  90.94
strArray               350.34 : 426.29 : 426.05 : 400.90

6.A Test: .Any(p => p == string);
[storage var]          [first]:[ last ]:[ none ]:[average]
strHashSet              75.70 : 331.38 : 339.40 : 248.82
strList                 72.51 : 305.00 : 319.29 : 232.26
strArray                38.49 : 213.63 : 227.13 : 159.75

Interesting (if not unexpected) results when we change the strings in the list:

private static HashSet<string> strHashSet = new HashSet<string>() 
{ "string00", "string01", "string02", "string03", "string04", "string05", 
  "string06", "string07", "string08", "string09", "string10", "string11" };

With "string99" as the none check.

1.B Test: result = (str == "string00" || str == "string01" ...
[storage var]          [first]:[ last ]:[ none ]:[average]
strArray                85.45 :  87.06 :  91.82 :  88.11

2.B Test: ExistsInList(string);
[storage var]          [first]:[ last ]:[ none ]:[average]
none                    30.12 :  27.97 :  21.36 :  26.48

3.B Test: .ContainsKey(string.GetHashCode());
[storage var]          [first]:[ last ]:[ none ]:[average]
strHashDict             32.51 :  28.00 :  20.83 :  27.11

4.B Test: .ContainsKey(string);
[storage var]          [first]:[ last ]:[ none ]:[average]
strDict                 36.45 :  32.13 :  22.39 :  30.32

5.B Test: .Contains(string);
[storage var]          [first]:[ last ]:[ none ]:[average]
strHashSet              37.29 :  34.33 :  23.56 :  31.73
strList                 23.34 : 147.75 : 153.04 : 108.04
strArray               349.62 : 460.19 : 459.99 : 423.26

6.B Test: .Any(p => p == string);
[storage var]          [first]:[ last ]:[ none ]:[average]
strHashSet              76.26 : 355.09 : 361.31 : 264.22
strList                 70.20 : 332.33 : 341.79 : 248.11
strArray                37.23 : 234.70 : 251.81 : 174.58

For cases A and B looks like tests 2 and 3 have the advantage.

However, HashSet.Contains(string) is very efficient, not effected by list contents and has a clear syntax...might be the best choice.

Yes, it is true, I have no life.

Are you actually running this 200,000 times in production code? You may want to consider hash checks as a faster negative-check if so.

If the 200,000 times was just to illustrate the difference, then I wouldn't worry about it. It's only a 0.02 millisecond increase in time.

Contains is more general-purpose than testing static strings, so there is a small amount of overhead. Unless this code is a bottleneck as Mark mentioned, it's not worth optimizing. There's a famous quote in CS: "premature optimization is the root of all evil." The quote isn't quite accurate, but it's a good reminder of the ultimate optimization guideline: measure first.

Both the methods you have tried have O(n) performance so they will get slower as you add more strings. If you are using .NET 3.5 or newer then you could try using a HashSet<string> instead and initializing it once at the start of the application. You can then get approximately O(1) lookups.

For .NET v2.0 you can emulate a HashSet using a Dictionary<string, object> and using ContainsKey and not using the value.

Here's an alternative you may arguably find readable and maintainable, that you may wish to test for speed. If you do test it for speed, please post your result!

        switch (str)
        {
            case "String1":
            case "String2":
            case "String3":
            case "String4":
                DoSomething();
                break;
        }

Although using a HashSet<string> as suggested could be a better option, the reason why strValues.Contains(str) is slower, is because it's a generic extension method. There is no such thing as a Contains method on arrays.

The way it works for arrays is basically

if (strValues is ICollection<string>) // true
{
    return ((ICollection<string>) strValues).Contains(str);
}

which adds a typecheck, typecast and virtual call. Then it will iterate the array (causing bounds checks). Only then will it come to do string comparisons. So it's doing a lot more work.

Note that in C# 3 (which you must be using if you're using extension methods), you can simply initialize the HashSet<string> like this:

public static HashSet<string> strValues = new HashSet<string> { 
                                   "String1", "String2", "String3", "String4" };

This keeps your program just as readable as it is now using arrays.

You may find that Contains() works better for a longer list. It may, for example, sort the list and do a binary search (just a thought experiment, for an example.)