Finding missing elements in an array_问答_开发者

Given you have an array A[1..n] of size n, it contains elements from the set {1..n}. However, two of the elements are missing, (and perhaps two of the array elements are repeated). Find the missing elements.

Eg if n=5, A may be A[5] = {1,2,1,3,2}; and so the missing elements are {4,5}

The approach I used was:

int flag[n] = {0};开发者_开发技巧  
int i;  
for(i = 0; i < n; i++)  {  
  flag[A[i]-1] = 1;  
 }  

for(i = 0; i < n; i++)  {  
 if(!flag[i]) {  
    printf("missing: %d", (i+1));  
}

the space complexity comes to O(n). I feel this is a very kiddish and inefficient code. So could you please provide a better algo with better space and time complexity.

Theoretically,

It is possible to do in O(1) space (in RAM model, i.e. O(1) words) and O(n) time even with a read-only array.

Warning: Long post with some mathematics. If you are only interested in the code and not the algorithm/proof, skip to the code section. You would need to read some parts of the algorithm section to understand the code, though.

Algorithm

Assume the missing numbers are x and y.

There are two possibilities for the array:

1) One number is repeated three times, and the remaining numbers in the array appear exactly once.

For this case, the bucketed XOR trick will work.

Do a XOR of all elements of the array with 1,2,...,n.

You end up with z = x XOR y.

There is at least one bit of z which is non-zero.

Now differentiating the elements of the array based on that bit (two buckets) do a XOR pass through the array again.

You will end up with x and y.

Once you have the x and y, you can confirm if these are indeed the missing elements.

If it so happens that the confirmation step fails, then we must have the second case:

2) Two elements repeated exactly twice and the rest appearing exactly once.

Let the two repeated elements be a and b (x and y are the missing ones).

Warning: Math ahead.

Let S_k = 1^k + 2^k + .. + n^k

For instance S_1 = n(n+1)/2, S_2 = n(n+1)(2n+1)/6 etc.

Now we compute seven things:

T_1 = Sum of the elements of the array = S_1 + a + b - x - y.
T_2 = Sum of the squares = S_2 + a^2 + b^2 - x^2 - y^2
T_3 = Sum of cubes = S_3 + a^3 + b^3 - x^3 - y^3
T_4 = Sum of fourth powers = S_4 + a^4 + b^4 - x^4 - y^4
...
T_7 = Sum of seventh powers = S_7 + a^7 + b^7 - x^7 - y^7

Note, we can use O(1) words (intsead of one) to deal with the overflow issues. (I estimate 8-10 words will be enough).

Let Ci = T_i - S_i

Now assume that a,b,x,y are the roots of the 4th degree polynomial P(z) = z^4 + pz^3 + qz^2 + rz + s

Now we try to transform the above seven equations into four linear equations in p,q,r,s.

For instance, if we do 4th Eqn + p * 3rd Eqn + q* 2nd equation + r* 1st equation

we get

C4 + p*C3 + q*C2 + r*C1 = 0

Similarly we get

C5 + p*C4 + q*C3 + r*C2 + s*C1 = 0
C6 + p*C5 + q*C4 + r*C3 + s*C2 = 0
C7 + p*C6 + q*C5 + r*C4 + s*C3 = 0

These are four linear equations in p,q,r,s which can be solved by Linear Algebra techniques like Gaussian Elimination.

Note that p,q,r,s will be rationals and so can be computed only with integer arithmetic.

Now suppose we are given a solution p,q,r,s to the above set of equations.

Consider P(z) = z^4 + pz^3 + qz^2 + rz + s.

What the above equations are saying is basically

P(a) + P(b) - P(x) - P(y) = 0
aP(a) + bP(b) - xP(x) -yP(y) = 0
a^2 P(a) + b^2 P(b) - x^2 P(x) - y^2 P(y)  = 0
a^3 P(a) + b^3 P(b) - x^3 P(x) - y^3 P(y) = 0

Now the matrix

   1   1  -1 -1
   a   b   -x   -y
   a^2 b^2 -x^2 -y^2
   a^3 b^3 -x^3 -y^3

has the same determinant as the Vandermonde matrix and thus is invertible, if a,b,x,y are distinct.

Thus we must have that P(a) = P(b) = P(x) = P(y) = 0.

Now check which of 1,2,3,...,n are roots of x^4 + px^3 + qx^2 + rx + s = 0.

Thus this is a linear time constant space algorithm.

Code

I wrote the following C# (.Net 4.0) code and it seems to work for the few samples I tried... (Note: I didn't bother catering to case 1 above).

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

using System.Numerics;

namespace SOManaged
{
    class Program
    {
        static void Main(string[] args)
        {
            ulong[] inp = {1,3,2,1,2};
            ulong[] inp1 = { 1,2,3,4,5,6,7,8,
                             9,10,11,13,14,15,
                             16,17,18,19,20,21,5,14};

            int N = 100000;
            ulong[] inp2 = new ulong[N];
            for (ulong i = 0; i < (ulong)N; i++)
            {
                inp2[i] = i+1;
            }
            inp2[122] = 44;
            inp2[419] = 13;

            FindMissingAndRepeated(inp);
            FindMissingAndRepeated(inp1);
            FindMissingAndRepeated(inp2);
        }

        static void FindMissingAndRepeated(ulong [] nums)
        {
            BigInteger[] C = new BigInteger[8];

            // Compute the C_i
            for (int k = 0; k < 8; k++)
            {
                C[k] = 0;
            }

            BigInteger i = 1;
            BigInteger n = 0;

            for (int j = 0; j < nums.Length; j++)
            {
                n = nums[j];
                i = j + 1;
                for (int k = 1; k < 8; k++)
                {
                    C[k] += i - n;
                    n = n * nums[j];
                    i = i * (j + 1);
                }
            }


            for (int k = 1; k <= 7; k++)
            {
                Console.Write("C[" + k.ToString() + "] = " + 
                               C[k].ToString() +", ");
            }
            Console.WriteLine();

            // Solve for p,q,r,s
            BigInteger[] pqrs = new BigInteger[4];
            BigInteger[] constants = new BigInteger[4];
            BigInteger[,] matrix = new BigInteger[4, 4];

            int start = 4;
            for (int row = 0; row < 4; row++ )
            {
                constants[row] = -C[start];

                int k = start-1;
                for (int col = 0; col < 4; col++)
                {
                    matrix[row, col] = C[k];
                    k--;
                }

                start++;
            }

            Solve(pqrs, matrix, constants, 4);

            for (int k = 0; k < 4; k++)
            {
                Console.Write("pqrs[" + k.ToString() + "] = " 
                               + pqrs[k].ToString() + ", ");
            }
            Console.WriteLine();

            // Find the roots.
            for (int k = 1; k <= nums.Length; k++)
            {
                BigInteger x = new BigInteger(k);
                BigInteger p_k = x * x * x* x + pqrs[0] * x* x * x 
                                 + pqrs[1] * x * x + pqrs[2] * x 
                                 + pqrs[3];

                if (p_k == 0)
                {
                    Console.WriteLine("Found: " + k.ToString());
                }
            }
        }

        // Solve using Cramer's method.
        // matrix * pqrs = constants.
        static void Solve(BigInteger[] pqrs, BigInteger[,] matrix, 
                          BigInteger[] constants, int n)
        {
            BigInteger determinant = Determinant(matrix, n);

            for (int i = 0; i < n; i++)
            {
                BigInteger[,] numerator = Replace(matrix, constants, n, i);
                BigInteger numDet = Determinant(numerator,4);
                pqrs[i] = numDet/ determinant;
            }
        }

        // Replace a column of matrix with constants.
        static BigInteger[,] Replace(BigInteger[,] matrix, 
                           BigInteger[] constants, int n, int col)
        {
            BigInteger[,] newMatrix = new BigInteger[n, n];
            for (int i = 0; i < n; i++)
            {
                for (int j = 0; j < n; j++)
                {
                    if (j != col)
                    {
                        newMatrix[i, j] = matrix[i, j];
                    }
                    else
                    {
                        newMatrix[i, j] = constants[i];
                    }
                }
            }

            return newMatrix;
        }

        // Recursively compute determinant for matrix.
        static BigInteger Determinant(BigInteger[,] matrix, int n)
        {
            BigInteger determinant = new BigInteger(0);
            int multiplier = 1;

            if (n == 1)
            {
                return matrix[0,0];
            }

            for (int i = 0; i < n; i++)
            {
                BigInteger [,] subMatrix = new BigInteger[n-1,n-1];
                int row = 0;
                for (int j=1; j < n; j++)
                {
                    int col = 0;
                    for (int k = 0; k < n ; k++)
                    {
                        if (k == i)
                        {
                            continue;
                        }
                        subMatrix[row,col] = matrix[j,k];
                        col++;
                    }
                    row++;
                }

                BigInteger subDeterminant = Determinant(subMatrix, n - 1);
                determinant += multiplier * subDeterminant * matrix[0,i];
                multiplier = -multiplier;
            }

            return determinant;
        }
    }
}

The output is

C[1] = 6, C[2] = 36, C[3] = 180, C[4] = 864, C[5] = 4116, C[6] = 19656, C[7] = 9
4380,
pqrs[0] = -12, pqrs[1] = 49, pqrs[2] = -78, pqrs[3] = 40,
Found: 1
Found: 2
Found: 4
Found: 5


C[1] = 15, C[2] = 407, C[3] = 9507, C[4] = 215951, C[5] = 4861515, C[6] = 108820
727, C[7] = 2424698067,
pqrs[0] = -53, pqrs[1] = 980, pqrs[2] = -7396, pqrs[3] = 18480,
Found: 5
Found: 12
Found: 14
Found: 22


C[1] = 486, C[2] = 189424, C[3] = 75861486, C[4] = 31342069984, C[5] = 130971109
69326, C[6] = 5492487308851024, C[7] = 2305818940736419566,
pqrs[0] = -600, pqrs[1] = 83183, pqrs[2] = -3255216, pqrs[3] = 29549520,
Found: 13
Found: 44
Found: 123
Found: 420

As @j_random_hacker pointed out, this is quite similar to Finding duplicates in O(n) time and O(1) space, and an adaptation of my answer there works here too. In pseudo-code:

for i := 1 to n
    while A[A[i]] != A[i] 
        swap(A[i], A[A[i]])
    end if
end for

for i := 1 to n
    if A[i] != i then 
        print i
    end if
end for

The first loop permutes the array so that if element x is present at least once, then one of those entries will be at position A[x].

Note that although it has a nested loop, it still runs in O(N) time - a swap only occurs if there is an i such that A[i] != i, and each swap sets at least one element such that A[i] == i, where that wasn't true before. This means that the total number of swaps (and thus the total number of executions of the while loop body) is at most N-1.

Your solution isn't bad. Here's an alternative - Sort the list and iterate through it checking adjacent numbers. When there is a gap, print all the numbers in between. If k is the length of the array and n is the number to count to, we get O(k lg k + n) time, O(1) space

This is bit qwirky As all your numbers are positive (by problem). I ll make the number at position i-1 a negetive if i is present in array.

int i;  
for(i = 0; i < n; i++)  {  
    A[abs(A[i])-1] = -1*abs(A[abs(A[i])-1]);
 }  

for(i = 0; i < n; i++)  {  
 if(A[i]>0) {  
    printf("missing: %d", i+1);  
}

Complexity O(n), no auxiliary array user, but destroys the input array.

Following is one of the approaches to identify all missing numbers when an array is known to contain only lumbers between 1 to n inclusive, without using any additional space. Time complexity is O(n).

Lets take a smallest number k such that it is not supposed to be in array therfore k = n+1 (lets call it an add factor).

first loop through each array and for every a[i] we will update a[a[i] - 1] += k; after this loop every array element contains two sets of information the number which was originally in the array element + k * (number of occurances of ith number in the array).

in the second loop you could find out how many repetations of ith number by doing an integer division of the number at each location by k. And original number at ith location would be a[i] % k;

Lets work through an example

A[5] = {1,2,1,3,2};

here (addfactor) k = 5 (array length) + 1 = 6

After fisrt loop array would look like if original element is m and occurances of ith number is O(i) resulting array element would be m + k * O(i) this element divide(integer) by k you'll get occurances of ith elemnent, and %k you'd get original array.

A = {1 + 6*2, 2 + 6*2, 1 + 6*1, 3+6*0 , 2+6*0 }
A = {13, 14, 7, 3, 2 }

Following is C# code which (I'm sorry, my C is little rusty its been a while.) could be ported to any language just by replacing Printf & scanfs.

    static void Main(string[] args)
    {
        int[] A = { 1, 2, 1, 3, 2 };
        PrintDuplicateAndMissing(A);
        Console.ReadLine();
    }

    static void PrintDuplicateAndMissing(int[] array)
    {
        int addfactor = array.Length + 1;
        for (int i = 0; i < array.Length; i++)
        {
            array[array[i] - 1] += addfactor; // -1 only if array contains from 1 to n. if it is 0 to n (this -1 is not required)
        }
        for (int i = 0; i < array.Length; i++)
        {
            if ( (array[i] / addfactor) == 0 )
                Console.WriteLine(string.Format("{0} is missing", i + 1)); // i + 1 only if array is 1 to n, if 0 to n then +1 is not required
            array[i] %= addfactor; //restore original content of the array
        }
    }

A sample code snippet for finding the missing elements without sorting the array below:

     public static void series(int[] arr) {
     for (int i = 0; i < arr.length; i++) {
        while (arr[i] != i + 1) {
            int jump = arr[arr[i] - 1];
            if (jump == arr[i]) {
                break;
            }
            arr[arr[i] - 1] = arr[i];
            arr[i] = jump;
        }
     }
     System.out.println("Missing number is ");
     for (int i = 0; i < arr.length; i++) {
        if (arr[i] != i + 1) {
            System.out.println(i + 1);
        } else {
            arr[i] = -1;
        }
     }

This code works for a series of numbers from 0 to N.

Cycle each element 0...n-1.

x = abs(A[i]) (with i = 0...n-1);

A[x - 1] can be: 
> 0: we haven't checked the element, change its sign to negative:
    A[x - 1] = -A[x - 1]
< 0: we already found the same number

At the end of the cycle, pass each A[0...n-1]. The index of the positive elements + 1 is the missing numbers.

So if

y = abs(A[i]) > 0: i + 1 is missing.

In C#

var arr = new[] { 1, 2, 1, 2, 4 };

for (int i = 0; i < arr.Length; i++) {
    int x = Math.Abs(arr[i]);
    int y = arr[x - 1];
    if (y > 0) {
        arr[x - 1] = -arr[x - 1];
    }
}

for (int i = 0; i < arr.Length; i++) {
    int x = arr[i];
    if (x > 0) {
        Console.WriteLine("Missing {0}", i + 1);
    } else {
        arr[i] = -arr[i];
    }
}

And the array is as good as new.

As we know we are looking for elements between 1 to N Create a Hash set containing 1 to N.

foreach(int i in input)
{
   if(hashset.contains(i))
   {
      hashset.delete(i);
   }
}

return "remaining elements in Hashset.";

The remaining elements in Hashset are the missing elements.

As you have given an array of n size and find the missing number when it's in a sequence.

#include<stdio.h>
main()
{
print("value of n");
scan("%d",&n);
print("enter the elements");
for(i=0;i<n;i++)
scan("%d",&a[i]);
for(i=0;i<n;i++)
{
d1[i]=a[i+1]-a[i];
temp=d1[i];
d[i]=temp;
}
for(i=0;i<n;i++)
{
if(d[i]==d[i+1]
{
c=d[i];
break;
}
}
for(i=0;i<n;i++)
b[i]=a[0]+i*c;
for(i=0;i<n;i++)
{
miss=0;
for(j=0;j<n;j++)
{
if(b[i]!=a[j])
{
miss++;
}
if(miss==n)
print("missing no. %d",b[i]);
}
}

It would find the missing when its in sequence only.