Algorithm for finding the number which appears the most in a row - C++_问答_开发者

I need a help in making an algorithm for s开发者_Go百科olving one problem: There is a row with numbers which appear different times in the row, and i need to find the number that appears the most and how many times it's in the row, ex:

1-1-5-1-3-7-2-1-8-9-1-2

That would be 1 and it appears 5 times.

The algorithm should be fast (that's my problem). Any ideas ?

What you're looking for is called the mode. You can sort the array, then look for the longest repeating sequence.

You could keep hash table and store a count of every element in that structure, like this

h[1] = 5
h[5] = 1
...

You can't get it any faster than in linear time, as you need to at least look at each number once.

If you know that the numbers are in a certain range, you can use an additional array to sum up the occurrences of each number, otherwise you'd need a hashtable, which is slightly slower.

Both of these need additional space though and you need to loop through the counts again in the end to get the result.

Unless you really have a huge amount of numbers and absolutely require O(n) runtime, you could simply sort your array of numbers. Then you can walk once through the numbers and simply keep the count of the current number and the number with the maximum of occurences in two variables. So you save yourself a lot of space, tradeing it off with a little bit of time.

There is an algorithm that solves your problem in linear time (linear in the number of items in the input). The idea is to use a hash table to associate to each value in the input a count indicating the number of times that value has been seen. You will have to profile against your expected input and see if this meets your needs.

Please note that this uses O(n) extra space. If this is not acceptable, you might want to consider sorting the input as others have proposed. That solution will be O(n log n) in time and O(1) in space.

Here's an implementation in C++ using std::tr1::unordered_map:

#include <iostream>
#include <unordered_map>

using namespace std;
using namespace std::tr1;

typedef std::tr1::unordered_map<int, int> map;

int main() {
    map m;

    int a[12] = {1, 1, 5, 1, 3, 7, 2, 1, 8, 9, 1, 2};
    for(int i = 0; i < 12; i++) {
        int key = a[i];
        map::iterator it = m.find(key);
        if(it == m.end()) {
            m.insert(map::value_type(key, 1));
        }
        else {
            it->second++;
        }
    }
    int count = 0;
    int value;
    for(map::iterator it = m.begin(); it != m.end(); it++) {
        if(it->second > count) {
            count = it->second;
            value = it->first;
        }
    }

    cout << "Value: " << value << endl;
    cout << "Count: " << count << endl;
}

The algorithm works using the input integers as keys in a hashtable to a count of the number of times each integer appears. Thus the key (pun intended) to the algorithm is building this hash table:

int key = a[i];
map::iterator it = m.find(key);
if(it == m.end()) {
    m.insert(map::value_type(key, 1));
}
else {
    it->second++;
}

So here we are looking at the ith element in our input list. Then what we do is we look to see if we've already seen it. If we haven't, we add a new value to our hash table containing this new integer, and an initial count of one indicating this is our first time seeing it. Otherwise, we increment the counter associated to this value.

Once we have built this table, it's simply a matter of running through the values to find one that appears the most:

int count = 0;
int value;
for(map::iterator it = m.begin(); it != m.end(); it++) {
    if(it->second > count) {
        count = it->second;
        value = it->first;
    }
}

Currently there is no logic to handle the case of two distinct values appearing the same number of times and that number of times being the largest amongst all the values. You can handle that yourself depending on your needs.

Here is a simple one, that is O(n log n):

Sort the vector @ O(n log n)
Create vars: int MOST, VAL, CURRENT
for ELEMENT in LIST:
    CURRENT += 1
    if CURRENT >= MOST:
        MOST = CURRENT
        VAL = ELEMENT
return (VAL, MOST)

There are few methods:

Universal method is "sort it and find longest subsequence" which is O(nlog n). The fastest sort algorithm is quicksort ( average, the worst is O( n^2 ) ). Also you can use heapsort but it is quite slower in average case but asymptotic complexity is O( n log n ) also in the worst case.

If you have some information about numbers then you can use some tricks. If numbers are from the limited range then you can use part of algorithm for counting sort. It is O( n ).

If this isn't your case, there are some other sort algorithms which can do it in linear time but no one is universal.

The best time complexity you can get here is O(n). You have to look through all elements, because the last element may be the one which determines the mode.

The solution depends on whether time or space is more important.

If space is more important, then you can sort the list then find the longest sequence of consecutive elements.

If time is more important, you can iterate through the list, keeping a count of the number of occurences of each element (e.g. hashing element -> count). While doing this, keep track of the element with max count, switching if necessary.

If you also happen know that the mode is the majority element (i.e. there are more than n/2 elements in the array with this value), then you can get O(n) speed and O(1) space efficiency.

Generic C++ solution:

#include <algorithm>
#include <iterator>
#include <map>
#include <utility>

template<class T, class U>
struct less_second
{
    bool operator()(const std::pair<T, U>& x, const std::pair<T, U>& y)
    {
        return x.second < y.second;
    }
};

template<class Iterator>
std::pair<typename std::iterator_traits<Iterator>::value_type, int>
most_frequent(Iterator begin, Iterator end)
{
    typedef typename std::iterator_traits<Iterator>::value_type vt;
    std::map<vt, int> frequency;
    for (; begin != end; ++begin) ++frequency[*begin];
    return *std::max_element(frequency.begin(), frequency.end(),
                             less_second<vt, int>());
}

#include <iostream>

int main()
{
    int array[] = {1, 1, 5, 1, 3, 7, 2, 1, 8, 9, 1, 2};
    std::pair<int, int> result = most_frequent(array, array + 12);
    std::cout << result.first << " appears " << result.second << " times.\n";
}

Haskell solution:

import qualified Data.Map as Map
import Data.List (maximumBy)
import Data.Function (on)

count = foldl step Map.empty where
    step frequency x = Map.alter next x frequency
    next  Nothing    = Just 1
    next (Just n)    = Just (n+1)

most_frequent = maximumBy (compare `on` snd) . Map.toList . count

example = most_frequent [1, 1, 5, 1, 3, 7, 2, 1, 8, 9, 1, 2]

Shorter Haskell solution, with help from stack overflow:

import qualified Data.Map as Map
import Data.List (maximumBy)
import Data.Function (on)

most_frequent = maximumBy (compare `on` snd) . Map.toList .
                Map.fromListWith (+) . flip zip (repeat 1)

example = most_frequent [1, 1, 5, 1, 3, 7, 2, 1, 8, 9, 1, 2]

The solution below gives you the count of each number. It is a better approach than using map in terms of time and space. If you need to get the number that appeared most number of times, then this is not better than previous ones.

EDIT: This approach is useful for unsigned numbers only and the numbers starting from 1.

    std::string row = "1,1,5,1,3,7,2,1,8,9,1,2";
    const unsigned size = row.size();
    int* arr = new int[size];
    memset(arr, 0, size*sizeof(int));
    for (int i = 0; i < size; i++)
    {
        if (row[i] != ',')
        {
            int val = row[i] - '0';
            arr[val - 1]++;
        }
    }

    for (int i = 0; i < size; i++)
        std::cout << i + 1 << "-->" << arr[i] << std::endl;

Since this is homework I think it's OK to supply a solution in a different language.

In Smalltalk something like the following would be a good starting point:

SequenceableCollection>>mode
  | aBag maxCount mode |

  aBag := Bag new
            addAll: self;
            yourself.
  aBag valuesAndCountsDo: [ :val :count |
    (maxCount isNil or: [ count > maxCount ])
      ifTrue: [ mode := val.
                maxCount := count ]].

  ^mode

As time is going by, the language evolves.

We have now many more language constructs that make life simpler

namespace aliases
CTAD (Class Template Argument Deduction)
more modern containers like std::unordered_map
range based for loops
the std::ranges library
projections
using statment
structured bindings
more modern algorithms

We could now come up with the following code:

#include <iostream>
#include <vector>
#include <unordered_map>
#include <algorithm>

namespace rng = std::ranges;

int main() {
    // Demo data
    std::vector data{ 2, 456, 34, 3456, 2, 435, 2, 456, 2 };

    // Count values
    using Counter = std::unordered_map<decltype (data)::value_type, std::size_t> ;

    Counter counter{}; for (const auto& d : data) counter[d]++;

    // Get max
    const auto& [value, count] = *rng::max_element(counter, {}, &Counter::value_type::second);

    // Show output
    std::cout << '\n' << value << " found " << count << " times\n";
}