开发者

Find if there is an element repeating itself n/k times

开发者 https://www.devze.com 2023-01-02 14:30 出处:网络
You have an array size n and a constant k (whatever) You can assume the the array is of int type (although it could be of any type)

You have an array size n and a constant k (whatever)

You can assume the the array is of int type (although it could be of any type)

Describe an algorithm that finds if there is an element(s) that repeats itself at least n/k times... if there is return one. Do so in linear time (O(开发者_开发百科n))

The catch: do this algorithm (or even pseudo-code) using constant memory and running over the array only twice


I'm not 100% sure, but it sounds like you want to solve the Britney Spears problem—finding an item that makes up a certain fraction of a sample using constant memory.

Here is a statement of the problem in English, with a sketch of the solution:

… from a 2002 article by Erik D. Demaine of MIT and Alejandro López-Ortiz and J. Ian Munro of the University of Waterloo in Canada. Demaine and his colleagues have extended the algorithm to cover a more-general problem: Given a stream of length n, identify a set of size m that includes all the elements occurring with a frequency greater than n /( m +1). (In the case of m =1, this reduces to the majority problem.) The extended algorithm requires m registers for the candidate elements as well as m counters. The basic scheme of operation is analogous to that of the majority algorithm. When a stream ele­ment matches one of the candidates, the corresponding counter is incremented; when there is no match to any candidate, all of the counters are decremented; if a counter is at 0, the associated candidate is replaced by a new element from the stream.


  1. Create a temporary array of size (k-1) to store elements and their counts (The output elements are going to be among these k-1 elements).

  2. Traverse through the input array and update temp[] (add/remove an element or increase/decrease count) for every traversed element. The array temp[] stores potential (k-1) candidates at every step. This step takes O(nk) time.

  3. Iterate through final (k-1) potential candidates (stored in temp[]). or every element, check if it actually has count more than n/k. This step takes O(nk) time.

The main step is step 2, how to maintain (k-1) potential candidates at every point? The steps used in step 2 are like famous game: Tetris. We treat each number as a piece in Tetris, which falls down in our temporary array temp[]. Our task is to try to keep the same number stacked on the same column (count in temporary array is incremented).

Consider k = 4, n = 9 
Given array: 3 1 2 2 2 1 4 3 3 

i = 0
         3 _ _
temp[] has one element, 3 with count 1

i = 1
         3 1 _
temp[] has two elements, 3 and 1 with 
counts 1 and 1 respectively

i = 2
         3 1 2
temp[] has three elements, 3, 1 and 2 with
counts as 1, 1 and 1 respectively.

i = 3
         - - 2 
         3 1 2
temp[] has three elements, 3, 1 and 2 with
counts as 1, 1 and 2 respectively.

i = 4
         - - 2 
         - - 2 
         3 1 2
temp[] has three elements, 3, 1 and 2 with
counts as 1, 1 and 3 respectively.

i = 5
         - - 2 
         - 1 2 
         3 1 2
temp[] has three elements, 3, 1 and 2 with
counts as 1, 2 and 3 respectively. 
Now the question arises, what to do when temp[] is full and we see a new element – we remove the bottom row from stacks of elements, i.e., we decrease count of every element by 1 in temp[]. We ignore the current element.

i = 6
         - - 2 
         - 1 2 
temp[] has two elements, 1 and 2 with
counts as 1 and 2 respectively.

i = 7
           - 2 
         3 1 2 
temp[] has three elements, 3, 1 and 2 with
counts as 1, 1 and 2 respectively.

i = 8          
         3 - 2
         3 1 2 
temp[] has three elements, 3, 1 and 2 with
counts as 2, 1 and 2 respectively.

Finally, we have at most k-1 numbers in temp[]. The elements in temp are {3, 1, 2}. Note that the counts in temp[] are useless now, the counts were needed only in step 2. Now we need to check whether the actual counts of elements in temp[] are more than n/k (9/4) or not. The elements 3 and 2 have counts more than 9/4. So we print 3 and 2.

Note that the algorithm doesn’t miss any output element. There can be two possibilities, many occurrences are together or spread across the array. If occurrences are together, then count will be high and won’t become 0. If occurrences are spread, then the element would come again in temp[]. Following is C++ implementation of above algorithm.

// A C++ program to print elements with count more than n/k
#include<iostream>
using namespace std;

// A structure to store an element and its current count
struct eleCount
{
    int e;  // Element
    int c;  // Count
};

// Prints elements with more than n/k occurrences in arr[] of
// size n. If there are no such elements, then it prints nothing.
void moreThanNdK(int arr[], int n, int k)
{
    // k must be greater than 1 to get some output
    if (k < 2)
       return;

    /* Step 1: Create a temporary array (contains element
       and count) of size k-1. Initialize count of all
       elements as 0 */
    struct eleCount temp[k-1];
    for (int i=0; i<k-1; i++)
        temp[i].c = 0;

    /* Step 2: Process all elements of input array */
    for (int i = 0; i < n; i++)
    {
        int j;

        /* If arr[i] is already present in
           the element count array, then increment its count */
        for (j=0; j<k-1; j++)
        {
            if (temp[j].e == arr[i])
            {
                 temp[j].c += 1;
                 break;
            }
        }

        /* If arr[i] is not present in temp[] */
        if (j == k-1)
        {
            int l;

            /* If there is position available in temp[], then place 
              arr[i] in the first available position and set count as 1*/
            for (l=0; l<k-1; l++)
            {
                if (temp[l].c == 0)
                {
                    temp[l].e = arr[i];
                    temp[l].c = 1;
                    break;
                }
            }

            /* If all the position in the temp[] are filled, then 
               decrease count of every element by 1 */
            if (l == k-1)
                for (l=0; l<k; l++)
                    temp[l].c -= 1;
        }
    }

    /*Step 3: Check actual counts of potential candidates in temp[]*/
    for (int i=0; i<k-1; i++)
    {
        // Calculate actual count of elements 
        int ac = 0;  // actual count
        for (int j=0; j<n; j++)
            if (arr[j] == temp[i].e)
                ac++;

        // If actual count is more than n/k, then print it
        if (ac > n/k)
           cout << "Number:" << temp[i].e
                << " Count:" << ac << endl;
    }
}

/* Driver program to test above function */
int main()
{
    cout << "First Test\n";
    int arr1[] = {4, 5, 6, 7, 8, 4, 4};
    int size = sizeof(arr1)/sizeof(arr1[0]);
    int k = 3;
    moreThanNdK(arr1, size, k);

    cout << "\nSecond Test\n";
    int arr2[] = {4, 2, 2, 7};
    size = sizeof(arr2)/sizeof(arr2[0]);
    k = 3;
    moreThanNdK(arr2, size, k);

    cout << "\nThird Test\n";
    int arr3[] = {2, 7, 2};
    size = sizeof(arr3)/sizeof(arr3[0]);
    k = 2;
    moreThanNdK(arr3, size, k);

    cout << "\nFourth Test\n";
    int arr4[] = {2, 3, 3, 2};
    size = sizeof(arr4)/sizeof(arr4[0]);
    k = 3;
    moreThanNdK(arr4, size, k);

    return 0;
}


There are two common (theoretical) approaches to this problem in O(n)

I) The first idea is the simplest

Step 1) While there are more than k distinct elements, Select k distinct elements and erase them all.

Step 2) Test all k distinct remaining elements for it's frequency

Proof of correctness: Note that While step will be executed at most n/k - 1 times. Suppose there is an element that repeats itself at least n/k times. In the worst case it could be chosen in all n/k-1 iterations and it will still be in the final array after it, after being tested it will be found.

Implementation: Step 1 can be implemented keeping an associative array (maps a key to a value) of size k-1 (constant), you sweep from left to right on the array, if you find an element that is already on the map, increase it's counter to 1, if the element is not on the map and the map is not full yet (less than k-1 elements), add this new element with initial counting 1, if the map is full, remove 1 from the counter of each element, if any element reaches 0, remove it from the map. In the end, the elements on this map will be the equivalent as the remaining elements you need to test. If, in the last iteration your map becomes empty you need to test all the elements before erasing to cover the case that the frequency is exactly n/k.

Complexity: Considering the worst approach to this map, O(n * k) = O(n), as k is contant.

Step 2 can be implemented by counting the frequency of all (maximum) k-1 elements Complexity: O(k*n) = O(n)

Overall complexity: O(n) + O(n) = O(n). (there's a small detail that was different from the implementation, a difference of 1 element, this happens because we want to also cover the case of frequency exactly n/k repetitions in the pseudocode, if not, we could allow one more iteration be possible with there are exactly k different elements, not necessarily more than k)

II) The second algorithm uses the selection algorithm in linear time http://en.wikipedia.org/wiki/Selection_algorithm and the partition algorithm which also runs in linear time. Using them, you break your array in k-1 buckets, with the invariant that any element in the ith bucket is smaller or equal than any element in the jth bucket for j > i in O(n). But note that the elements are not sorted inside each bucket.

Now, you use the fact that each bucket has n/(k-1) elements, and you're looking for an element that repeats itself at least (n/k), and (n/k) > n/(2*(k-1)). This suffices to use the majority theorem, which states that if an element is the majority (more frequent than number of elements divided by 2), then it's also the median of the array. You can get the median again using the selection algorithm.

So, you just test all medians and all pivots for the partitions, which you need to test them because they may split equal values in two different buckets, there are k-1 + k values, complexity O((2*k-1)*n)) = O(n).


A simple O(n) algorithm would be to keep a hashed map from the number found to the number of instances found. Using a hashed map is important for maintaining O(n). The, a final pass over the map will reveal the answers. This pass is also O(n) since the worst case scenario is that every element appeared only once and hence the map is the same size as the original array.


I don't know if you are restricted on which additional data structures you can use.

What about creating a hashmap with 'elements' <--> count mapping. Insertion is O(log N). Lookup is O(1). For each element, lookup on hashtable, insert if does not exist with count 1. If exists, check if count < (n/k). It will stay O(n).

EDIT:

I forgot the constant memory restriction. It's preallocating hash map entries with N elements permitted?


This is my implementation of Jerky algorithm described above:

#include <map>
#include <vector>
#include <iostream>
#include <algorithm>

std::vector<int> repeatingElements(const std::vector<int>& a, int k)
{
    if (a.empty())
        return std::vector<int>();

    std::map<int, int> candidateMap; //value, count;

    for (int i = 0; i < a.size(); i++)
    {
        if (candidateMap.find(a[i]) != candidateMap.end())
        {
            candidateMap[a[i]]++;
        }
        else
        {
            if (candidateMap.size() < k-1)
            {
                candidateMap[a[i]] = 1;
            }
            else
            {
                for (std::map<int, int>::iterator iter = candidateMap.begin();
                     iter != candidateMap.end();)
                {
                    (iter->second)--;

                    if (iter->second == 0)
                    {
                        iter = candidateMap.erase(iter);
                    }
                    else
                    {
                        iter++;
                    }
                }   
            }
        }
    }

    std::vector<int> ret;

    for (std::map<int, int>::iterator iter = candidateMap.begin();
         iter != candidateMap.end(); iter++)
    {
        int candidate = iter->first;

        if (std::count(a.begin(), a.end(), candidate) > (a.size() / k))
        {
            ret.push_back(candidate);
        }
    }

    return ret;
}

int main()
{
    std::vector<int> a = { 1, 1, 4, 2, 2, 3, 3 };   
    int k = 4;

    std::vector<int> repeating_elements = repeatingElements(a, k);

    for (int elem : repeating_elements)
    {
        std::cout << "Repeating more than n/" << k << " : " << elem << std::endl;
    }

    return 0;
}

And the output is:

Repeating more than n/4 : 1

Repeating more than n/4 : 2

Repeating more than n/4 : 3

0

精彩评论

暂无评论...
验证码 换一张
取 消