Longest Consecutive Sequence in an Unsorted Array [duplicate]_问答_开发者

This question already has answers here: Finding contiguous ranges in arrays (9 answers) Closed 9 years ago.

You are given an Array of numbers and they开发者_运维技巧 are unsorted/random order. You are supposed to find the longest sequence of consecutive numbers in the array. Note the sequence need not be in sorted order within the array. Here is an example :

Input :

A[] = {10,21,45,22,7,2,67,19,13,45,12,11,18,16,17,100,201,20,101}

Output is :

{16,17,18,19,20,21,22}

The solution needs to be of O(n) complexity.

I am told that the solution involves using a hash table and I did come across few implementations that used 2 hash tables. One cannot sort and solve this because sorting would take O(nlgn) which is not what is desired.

You could have two tables:

Start table: (start-point, length)
End table: (ending-point, length)

When adding a new item, you would check:

Does value + 1 exist in start table? If so, delete it and create a new entry of (value, length + 1) where length is the "current" length. You'd also update the end table with the same end point but greater length.
Does value - 1 exist in the end table? If so, delete it and create a new entry of (value, length + 1), and this time update the start table (the starting position will be the same, but the length will be increased)

If both conditions hold, then you're effectively stitching two existing sequences together - replace the four existing entries with two new entries, representing the single longer sequence.

If neither condition holds, you just create a new entry of length 1 in both tables.

After all the values have been added, you can just iterate over the start table to find the key with the largest value.

I think this would work, and would be O(n) if we assume O(1) hash lookup/add/delete.

EDIT: C# implementation. It took a little while to get right, but I think it works :)

using System;
using System.Collections.Generic;

class Test
{
    static void Main(string[] args)
    {
        int[] input = {10,21,45,22,7,2,67,19,13,45,12,
                11,18,16,17,100,201,20,101};

        Dictionary<int, int> starts = new Dictionary<int, int>();
        Dictionary<int, int> ends = new Dictionary<int, int>();

        foreach (var value in input)
        {
            int startLength;
            int endLength;
            bool extendsStart = starts.TryGetValue(value + 1,
                                                   out startLength);
            bool extendsEnd = ends.TryGetValue(value - 1,
                                               out endLength);

            // Stitch together two sequences
            if (extendsStart && extendsEnd)
            {
                ends.Remove(value + 1);
                starts.Remove(value - 1);
                int start = value - endLength;
                int newLength = startLength + endLength + 1;
                starts[start] = newLength;
                ends[start + newLength - 1] = newLength;
            }
            // Value just comes before an existing sequence
            else if (extendsStart)
            {
                int newLength = startLength + 1;
                starts[value] = newLength;
                ends[value + newLength - 1] = newLength;
                starts.Remove(value + 1);
            }
            else if (extendsEnd)
            {
                int newLength = endLength + 1;
                starts[value - newLength + 1] = newLength;
                ends[value] = newLength;
                ends.Remove(value - 1);
            }
            else
            {
                starts[value] = 1;
                ends[value] = 1;
            }
        }

        // Just for diagnostics - could actually pick the longest
        // in O(n)
        foreach (var sequence in starts)
        {
            Console.WriteLine("Start: {0}; Length: {1}",
                              sequence.Key, sequence.Value);
        }
    }
}

EDIT: Here's the single-hashset answer implemented in C# too - I agree, it's simpler than the above, but I'm leaving my original answer for posterity:

using System;
using System.Collections.Generic;
using System.Linq;

class Test
{
    static void Main(string[] args)
    {
        int[] input = {10,21,45,22,7,2,67,19,13,45,12,
                11,18,16,17,100,201,20,101};

        HashSet<int> values = new HashSet<int>(input);

        int bestLength = 0;
        int bestStart = 0;
        // Can't use foreach as we're modifying it in-place
        while (values.Count > 0)
        {
            int value = values.First();
            values.Remove(value);
            int start = value;
            while (values.Remove(start - 1))
            {
                start--;
            }
            int end = value;
            while (values.Remove(end + 1))
            {
                end++;
            }

            int length = end - start + 1;
            if (length > bestLength)
            {
                bestLength = length;
                bestStart = start;
            }
        }
        Console.WriteLine("Best sequence starts at {0}; length {1}",
                          bestStart, bestLength);
    }
}

Dump everything to a hash set.

Now go through the hashset. For each element, look up the set for all values neighboring the current value. Keep track of the largest sequence you can find, while removing the elements found from the set. Save the count for comparison.

Repeat this until the hashset is empty.

Assuming lookup, insertion and deletion are O(1) time, this algorithm would be O(N) time.

Pseudo code:

 int start, end, max
 int temp_start, temp_end, count

 hashset numbers

 for element in array:
     numbers.add(element)

 while !numbers.empty():
     number = numbers[0]
     count = 1
     temp_start, temp_end = number 

     while numbers.contains(number - 1):
         temp_start = number - 1; count++
         numbers.remove(number - 1)

     while numbers.contains(number + 1):
         temp_end = number + 1; count++
         numbers.remove(number + 1)

     if max < count:
         max = count
         start = temp_start; end = temp_end

 max_range = range(start, end)

The nested whiles don't look pretty, but each number should be used only once so should be O(N).

Here is a solution in Python that uses just a single hash set and doesn't do any fancy interval merging.

def destruct_directed_run(num_set, start, direction):
  while start in num_set:
    num_set.remove(start)
    start += direction
  return start

def destruct_single_run(num_set):
  arbitrary_member = iter(num_set).next()
  bottom = destruct_directed_run(num_set, arbitrary_member, -1) 
  top = destruct_directed_run(num_set, arbitrary_member + 1, 1)
  return range(bottom + 1, top)

def max_run(data_set):
  nums = set(data_set)
  best_run = []
  while nums:
    cur_run = destruct_single_run(nums)
    if len(cur_run) > len(best_run):
      best_run = cur_run
  return best_run

def test_max_run(data_set, expected):
  actual = max_run(data_set)
  print data_set, actual, expected, 'Pass' if expected == actual else 'Fail'

print test_max_run([10,21,45,22,7,2,67,19,13,45,12,11,18,16,17,100,201,20,101], range(16, 23))
print test_max_run([1,2,3], range(1, 4))
print max_run([1,3,5]), 'any singleton output fine'

Another solution is with hash search which runs in O(n)

int maxCount = 0;
for (i = 0; i<N; i++) 
{ 
    // Search whether a[i] - 1 is present in the list.If it is present, 
    // you don't need to initiate count since it  will be counted when 
    // (a[i] - 1) is traversed.
    if (hash_search(a[i]-1))
        continue;

    // Now keep checking if a[i]++ is present in the list, increment the count
    num = a[i]; 
    while (hash_search(++num)) 
        count++;

    // Now check if this count is greater than the max_count got previously 
    // and update if it is
    if (count > maxCount)
    {
        maxIndex = i;
        count = maxCount;
    }
}

Here is the implementation:

static int[] F(int[] A)
{
    Dictionary<int, int> low = new Dictionary<int, int>();
    Dictionary<int, int> high = new Dictionary<int, int>();

    foreach (int a in A)
    {
        int lowLength, highLength;

        bool lowIn = low.TryGetValue(a + 1, out lowLength);
        bool highIn = high.TryGetValue(a - 1, out highLength);

        if (lowIn)
        {
            if (highIn)
            {
                low.Remove(a + 1);
                high.Remove(a - 1);
                low[a - highLength] = high[a + lowLength] = lowLength + highLength + 1;
            }
            else
            {
                low.Remove(a + 1);
                low[a] = high[a + lowLength] = lowLength + 1;
            }
        }
        else
        {
            if (highIn)
            {
                high.Remove(a - 1);
                high[a] = low[a - highLength] = highLength + 1;
            }
            else
            {
                high[a] = low[a] = 1;
            }
        }
    }

    int maxLow = 0, maxLength = 0;
    foreach (var pair in low)
    {
        if (pair.Value > maxLength)
        {
            maxLength = pair.Value;
            maxLow = pair.Key;
        }
    }

    int[] ret = new int[maxLength];
    for (int i = 0; i < maxLength; i++)
    {
        ret[i] = maxLow + i;
    }

    return ret;
}

class Solution {
public:
    struct Node{
        int lower;
        int higher;
        Node(int l, int h):lower(l),higher(h){

    }
};
int longestConsecutive(vector<int> &num) {
    // Start typing your C/C++ solution below
    // DO NOT write int main() function

    map<int,Node> interval_map;
    map<int,Node>::iterator curr_iter,inc_iter,des_iter;

    //first collect
    int curr = 0;
    int max = -1;
    for(size_t i = 0; i < num.size(); i++){
        curr = num[i];
        curr_iter = interval_map.find(curr);
        if (curr_iter == interval_map.end()){
            interval_map.insert(make_pair(curr,Node(curr,curr)));
        }
    } 
    //the next collect    
    for(curr_iter = interval_map.begin(); curr_iter != interval_map.end(); curr_iter++)
    {
        int lower = curr_iter->second.lower;
        int higher = curr_iter->second.higher;
        int newlower = lower, newhigher = higher;

        des_iter = interval_map.find(lower - 1);
        if (des_iter != interval_map.end())
        {
            curr_iter->second.lower = des_iter->second.lower;
            newlower = des_iter->second.lower;
        }

        inc_iter = interval_map.find(higher + 1);
        if (inc_iter != interval_map.end()){
            curr_iter->second.higher = inc_iter->second.higher;
            newhigher = inc_iter->second.higher;
        }

        if (des_iter != interval_map.end()){
            des_iter->second.higher = newhigher;
        }
        if (inc_iter != interval_map.end()){
            inc_iter->second.lower = newlower;
        }
        if (curr_iter->second.higher - curr_iter->second.lower + 1> max){
             max = curr_iter->second.higher - curr_iter->second.lower + 1;
         }
    }   
    return max;
}
};

This is the solution of Grigor Gevorgyan from a duplicate of this question, but I think simplified:

data = [1,3,5,7,4,6,10,3]

# other_sides[x] == other end of interval starting at x
# unknown values for any point not the end of an interval
other_sides = {}
# set eliminates duplicates, and is assumed to be an O(n) operation
for element in set(data):
    # my intervals left hand side will be the left hand side
    # of an interval ending just before this element
    try:
        left = other_sides[element - 1]
    except KeyError:
        left = element

    # my intervals right hand side will be the right hand side
    # of the interval starting just after me
    try:
        right = other_sides[element + 1]
    except KeyError:
        right = element

    # satisfy the invariants
    other_sides[left] = right
    other_sides[right] = left

# convert the dictionary to start, stop segments
# each segment is recorded twice, so only keep half of them
segments = [(start, stop) for start, stop in other_sides.items() if start <= stop]
# find the longest one
print max(segments, key = lambda segment: segment[1] - segment[0])

Here's python code based by answer by Grigor Gevorgyan for similar question, I think it's very elegant solution of that problem

l = [10,21,45,22,7,2,67,19,13,45,12,11,18,16,17,100,201,20,101]
d = {x:None for x in l}
print d
for (k, v) in d.iteritems():
    if v is not None: continue
    a, b = d.get(k - 1), d.get(k + 1)
    if a is not None and b is not None: d[k], d[a], d[b] = k, b, a
    elif a is not None: d[a], d[k] = k, a
    elif b is not None: d[b], d[k] = k, b
    else: d[k] = k
    print d

m = max(d, key=lambda x: d[x] - x)
print m, d[m]

output:

{2: 2, 67: None, 100: None, 101: None, 7: None, 201: None, 10: None, 11: None, 12: None, 45: None, 13: None, 16: None, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None}
{2: 2, 67: 67, 100: None, 101: None, 7: None, 201: None, 10: None, 11: None, 12: None, 45: None, 13: None, 16: None, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None}
{2: 2, 67: 67, 100: 100, 101: None, 7: None, 201: None, 10: None, 11: None, 12: None, 45: None, 13: None, 16: None, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None}
{2: 2, 67: 67, 100: 101, 101: 100, 7: None, 201: None, 10: None, 11: None, 12: None, 45: None, 13: None, 16: None, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None}
{2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: None, 10: None, 11: None, 12: None, 45: None, 13: None, 16: None, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None}
{2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: None, 11: None, 12: None, 45: None, 13: None, 16: None, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None}
{2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: 10, 11: None, 12: None, 45: None, 13: None, 16: None, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None}
{2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: 11, 11: 10, 12: None, 45: None, 13: None, 16: None, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None}
{2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: 12, 11: 10, 12: 10, 45: None, 13: None, 16: None, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None}
{2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: 12, 11: 10, 12: 10, 45: 45, 13: None, 16: None, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None}
{2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: 13, 11: 10, 12: 10, 45: 45, 13: 10, 16: None, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None}
{2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: 13, 11: 10, 12: 10, 45: 45, 13: 10, 16: 16, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None}
{2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: 13, 11: 10, 12: 10, 45: 45, 13: 10, 16: 17, 17: 16, 18: None, 19: None, 20: None, 21: None, 22: None}
{2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: 13, 11: 10, 12: 10, 45: 45, 13: 10, 16: 18, 17: 16, 18: 16, 19: None, 20: None, 21: None, 22: None}
{2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: 13, 11: 10, 12: 10, 45: 45, 13: 10, 16: 19, 17: 16, 18: 16, 19: 16, 20: None, 21: None, 22: None}
{2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: 13, 11: 10, 12: 10, 45: 45, 13: 10, 16: 20, 17: 16, 18: 16, 19: 16, 20: 16, 21: None, 22: None}
{2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: 13, 11: 10, 12: 10, 45: 45, 13: 10, 16: 21, 17: 16, 18: 16, 19: 16, 20: 16, 21: 16, 22: None}
{2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: 13, 11: 10, 12: 10, 45: 45, 13: 10, 16: 22, 17: 16, 18: 16, 19: 16, 20: 16, 21: 16, 22: 16}
16 22