Find the element repeated more than n/2 times_问答_开发者

There is an array (of size N) with an element repeated more than N/2 number of time and the rest of the element in the array can also be repeated but only one element is repeated more than N/2 times. Find the number.

I could think of few approaches:

Naive, keep the count of each number in a hash map.
Simplest, sort the array and the number at n/2+1 th index is the required number.
Keep count of only consecutive duplicate values found. Check separately for the pattern where the values are stored alternatively.

开发者_JAVA百科

Unable to think of a better solution, there has to be.

There is a beautiful algorithm for solving this that works in two passes (total time O(N)) using only constant external space (O(1)). I have an implementation of this algorithm, along with comments including a correctness proof, available here

The intuition behind the algorithm is actually quite beautiful. Suppose that you were to have a roomful of people each holding one element of the array. Whenever two people find each other where neither is holding the same array element as the other, the two of them sit down. Eventually, at the very end, if anyone is left standing, there's a chance that they're in the majority, and you can just check that element. As long as one element occurs with frequency at least N/2, you can guarantee that this approach will always find the majority element.

To actually implement the algorithm, you make a linear scan over the array and keep track of your current guess as to what the majority element is, along with the number of times that you've seen it so far. Initially, this guess is undefined and the number of repeats is zero. As you walk across the array, if the current element matches your guess, you increment the counter. If the current element doesn't match your guess, you decrement the counter. If the counter ever hits zero, then you reset it to the next element you encounter. You can think about this implementation as a concrete realization of the above "standing around in a room" algorithm. Whenever two people meet with different elements, they cancel out (dropping the counter). Whenever two people have the same element, then they don't interact with each other.

For a full correctness proof, citation to the original paper (by Boyer and Moore of the more famous Boyer-Moore string matching algorithm), and an implementation in C++, check out the above link.

This is the Majority element problem. There is a single pass, constant space algorithm for this problem. Here is a brief algorithm coded in python:


    import random

    items = [1, 2, 3, 4, 5, 5, 5, 5, 5 ]
    # shuffle the items
    random.shuffle(items)

    print("shuffled items: ", items)

    majority_elem = items[0]
    count = 1
    for i in range(1,len(items)):
        if items[i] == majority_elem:
            count += 1
        else: 
            count -= 1
            if count == 0:
                majority_elem = items[i]
                count = 1

    print("majority element : %d" % majority_elem )

We use a variable majority_elem to keep track of majority element and a counter (count)

Initially we set the first element of the array as the majority element.
we navigate through the array,
if the current element == majority element : increment count
else : { decrement count. if count becomes zero, set count = 1 and set majority_element = current element. }

There is a variation to this problem, instead of an array, there could be a very large sequence and we do not know the length before hand. If this case, sorting or partioning is not helpful.

References:

The Art of Computer Programming, Fascicle 0: Introduction to Combinatorial Algorithms and Boolean Functions, Volume 0; Volume 4

If an element is repeated more than N/2 times then it must be the median. There are many algorithms that allow you to find this efficiently.

Are you familiar with quicksort? It has a function called 'partition' that, given a value, divides the array into a section where all values are greater than the value (the pivot) are on one side, while all values less than the value are on the other side. Note that this is not a sort, simply a separation. The N/2 count item will be in the larger of the two sections. You can recursively apply this technique to find the element in O(n) time.

wikipedia: quicksort, or Partition-based general selection algorithm

In your second approach, you are essentially selecting the median element. Take a look at algorithms for finding the median of a list of numbers. In particular, a selection algorithm would work fine for this and compute it in O(n).

Hoare's selection algorithm works very similar to quick sort, except that instead of recursing down both partitions, it only recurses down one partition (the partition that contains the kth element).

In C++, the standard library provides a selection algorithm in the form of std::nth_element, which guarantees O(n) average complexity. You can use this find the median.

int a[8] = {5, 1, 1, 1, 2, 1, 3, 1};
int median = *std::nth_element(a, a + 4, a + 8);

Note that std::nth_element will also partially sort the array in place.

No need for sorting. You can simply use a median selection algorithm to determine the n/2-th element. Quickselect runs in O(n) expected time. Median of medians runs in O(n).

Sort the array using any sorting algorithm. The element which was repeated more than half of the time will always be the mid element.The complexity will be nlog(n).