开发者

Trade off point between Dictionary(hashtable) and List find

开发者 https://www.devze.com 2023-01-02 01:49 出处:网络
Short question: How many elements can be in a list to make a linear search like List O(n) faster then Dictionary which is O(1)?

Short question: How many elements can be in a list to make a linear search like List O(n) faster then Dictionary which is O(1)?

I seem to remember in college(and high school) when comparing searching techniques ( binary vs linear) there was always a point where linear was faster. that was O(n) vs O(log)

I can't draw the graph here. but there must be some over head for the constant performance. So the question is if I have 10 items does List.Find mak开发者_运维技巧e more sense then

if (Dictionary.Contains(x))
    value = Directiory[x]

Or a Hashtable where value = Hashtable[x] will not fail but will require boxing


I asked myself that same question, and ran a benchmark to find out. I found the speed of a dictionary already outperforms a list in lookup at around 3-4 elements.

That seemed far to few to me based on the overhead of a dictionary, so I checked around to see if anyone else came upon the same results, and that seems to be the results others found as well. http://dotnetperls.com/dictionary-time

That doesn't mean, throw dozens of dictionaries into your code where they don't make sense - there's a memory overhead and construction time to deal with as well. But if you have a decently sized set of keyed data, take advantage of that structure.


Test both methods with your data and make a decision.

With only 10 items in your collection, it's likely the constant cost of hashing will outweigh the cost of looking at each item in your list (worst case). We can't say for sure, though. Test it and find out.

Keep in mind inserting into a Dictionary is also more expensive than inserting into a List. Depending on where and how you're using your collection, an additional insert cost may be undesirable.


So after running some test of my own it looks like Hash-Dictionary always wins. That is a Dictionary object with a HashCode, int32, as a key

10 items    500,000 itterations             
Test Name                                           First   Last    Not Found   Average 
FindInList                                          104.26  255.29  254.63      204.73
FindInArray                                         51.28   192.23  182.91      142.14
FindInHashDict                                      56.3    54.38   51.16       53.95
FindInDict                                          105.75  101.38  52.02       86.38

100 items   500,000 itterations             
Test Name                                           First   Last    Not Found   Average 
FindInList                                          102.83  1873.45 1820.85     1265.71
FindInArray                                         56.21   1313.61 1310.65     893.49
FindInHashDict                                      91.01   53.31   60.46       68.26
FindInDict                                          119.01  101.65  100.11      106.92

Here is my code that performs the find operation. My objects are hierarchical task that would be searched by a unique name. I know it is a long post but in case someone wanted to dispute the findings they can see the code.

    private SearchResult FindInDict()
    {
        SearchResult result = new SearchResult();
        result.SeachType = "FindInDict";
        result.itterations = 1;

        Stopwatch timer = new Stopwatch();

        timer.Start();
        if (dictStrBoundryTask.ContainsKey(NameOfFirst))
        {
            TaskBase t = dictStrBoundryTask[NameOfFirst];
        }

        timer.Stop();

        result.firstItem = timer.Elapsed.TotalMilliseconds;

        timer.Reset();

        timer.Start();
        if (dictStrBoundryTask.ContainsKey(NameOfLast))
        {
            TaskBase t = dictStrBoundryTask[NameOfLast];
        }

        timer.Stop();

        result.lastItem = timer.Elapsed.TotalMilliseconds;

        timer.Reset();

        timer.Start();
        if (dictStrBoundryTask.ContainsKey(NameOfNotFound))
        {
            TaskBase t = dictStrBoundryTask[NameOfNotFound];
        }

        timer.Stop();

        result.notFoundItem = timer.Elapsed.TotalMilliseconds;

        return result;

    }

    private SearchResult FindInHashDict()
    {
        SearchResult result = new SearchResult();
        result.SeachType = "FindInHashDict";
        result.itterations = 1;

        Stopwatch timer = new Stopwatch();

        timer.Start();
        if (dictIntBoundryTask.ContainsKey(NameOfFirst.GetHashCode()))
        {
            TaskBase t = dictIntBoundryTask[NameOfFirst.GetHashCode()];
        }

        timer.Stop();

        result.firstItem = timer.Elapsed.TotalMilliseconds;

        timer.Reset();

        timer.Start();
        if (dictIntBoundryTask.ContainsKey(NameOfLast.GetHashCode()))
        {
            TaskBase t = dictIntBoundryTask[NameOfLast.GetHashCode()];
        }

        timer.Stop();

        result.lastItem = timer.Elapsed.TotalMilliseconds;

        timer.Reset();

        timer.Start();
        if (dictIntBoundryTask.ContainsKey(NameOfNotFound.GetHashCode()))
        {
            TaskBase t = dictIntBoundryTask[NameOfNotFound.GetHashCode()];
        }

        timer.Stop();

        result.notFoundItem = timer.Elapsed.TotalMilliseconds;

        return result;
    }

    private SearchResult FindInArray()
    {
        SearchResult result = new SearchResult();
        result.SeachType = "FindInArray";
        result.itterations = 1;

        Stopwatch timer = new Stopwatch();

        timer.Start();
        foreach (TaskBase t in arrayBoundaryTask)
        {
            if (t.Name == NameOfFirst)
            {
                break;
            }
        }

        timer.Stop();

        result.firstItem = timer.Elapsed.TotalMilliseconds;

        timer.Reset();

        timer.Start();
        foreach (TaskBase t in arrayBoundaryTask)
        {
            if (t.Name == NameOfLast)
            {
                break;
            }
        }
        timer.Stop();

        result.lastItem = timer.Elapsed.TotalMilliseconds;

        timer.Reset();

        timer.Start();
        foreach (TaskBase t in arrayBoundaryTask)
        {
            if (t.Name == NameOfNotFound)
            {
                break;
            }
        }

        timer.Stop();

        result.notFoundItem = timer.Elapsed.TotalMilliseconds;

        return result;
    }

    private SearchResult FindInList()
    {
        SearchResult result = new SearchResult();
        result.SeachType = "FindInList";
        result.itterations = 1;

        Stopwatch timer = new Stopwatch();

        timer.Start();
        TaskBase t = listBoundaryTask.Find(x => x.Name == NameOfFirst);

        if (t!=null)
        {

        }

        timer.Stop();

        result.firstItem = timer.Elapsed.TotalMilliseconds;

        timer.Reset();

        timer.Start();
        t = listBoundaryTask.Find(x => x.Name == NameOfLast);

        if (t != null)
        {

        }
        timer.Stop();

        result.lastItem = timer.Elapsed.TotalMilliseconds;

        timer.Reset();

        timer.Start();
        t = listBoundaryTask.Find(x => x.Name == NameOfNotFound);

        if (t != null)
        {

        }

        timer.Stop();

        result.notFoundItem = timer.Elapsed.TotalMilliseconds;

        return result;
    }


I don't think that there is a hard and fast rule you could apply to come up with a bulletproof metric. You could always imagine a degenerate Hashtable case where you end up with O(n) searches due to collisions, however with realistic data the odds of this are minuscule.

In general case, if fast searching is important for your application then a Hashtable-type construct is always your best choice.

(Hashtables have their own set of drawbacks - they're not a magic bullet. Wikipedia has a good explanation of cases when you might not want to choose a Hashtable: link.)


Dictionary search is always faster then linear search.

0

精彩评论

暂无评论...
验证码 换一张
取 消