Converting sets of integers into ranges_问答_开发者

What's the most idiomatic way to convert a set of integers into a set of ranges?

E.g. given the set {0, 1, 2, 3, 4, 7, 8, 9, 11} I want to get { {0,开发者_运维技巧4}, {7,9}, {11,11} }.

Let's say we are converting from std::set<int> into std::vector<std::pair<int, int>>. I treat Ranges as inclusive on both sides, since it's more convenient in my case, but I can work with open-ended ranges too if necessary.

I've written the following function, but I feel like reinventing the wheel. Please tell maybe there's something in STL or boost for this.

typedef std::pair<int, int> Range;

void setToRanges(const std::set<int>& indices, std::vector<Range>& ranges)
{
    Range r = std::make_pair(-INT_MAX, -INT_MAX);

    BOOST_FOREACH(int i, indices)
    {
           if (i != r.second + 1)
           {
            if (r.second >= 0) ranges.push_back(r);
            r.first = i;                    
           }

           r.second = i;
    }

    ranges.push_back(r);
}

Now one can use interval_set from Boost.ICL (Boost > 1.46)

#include <set>
#include <iostream>
#include <algorithm>

#include <boost/icl/discrete_interval.hpp>
#include <boost/icl/closed_interval.hpp>
#include <boost/icl/interval_set.hpp>

typedef std::set<int> Set;
typedef boost::icl::interval_set<int> IntervalSet;

void setToInterval(const Set& indices, IntervalSet& intervals)
{
    Set::const_iterator pos;
    for(pos = indices.begin(); pos != indices.end(); ++pos)
    {
        intervals.insert(boost::icl::construct<boost::icl::discrete_interval<int> >(*pos, *pos, boost::icl::interval_bounds::closed()));
    }
}

int main()
{
    std::cout << ">>Interval Container Library Rocks! <<\n";
    std::cout << "----------------------------------------------------\n";

    Set indices = {0, 1, 2, 3, 4, 7, 8, 9, 11};
    IntervalSet intervals;

    setToInterval(indices, intervals);

    std::cout << "  intervals joined:    " << intervals  << "\n";

    return 0;
}

Output:

  intervals joined:    {[0,4][7,9][11,11]}

I don't think there's anything in the STL or Boost that does this.

One thing you can do is to make your algorithm a little bit more general:

template<class InputIterator, class OutputIterator>
void setToRanges(InputIterator first, InputIterator last, OutputIterator dest)
{
    typedef std::iterator_traits<InputIterator>::value_type item_type;
    typedef typename std::pair<item_type, item_type> pair_type;
    pair_type r(-std::numeric_limits<item_type>::max(), 
                -std::numeric_limits<item_type>::max());

    for(; first != last; ++first)
    {
        item_type i = *first;
        if (i != r.second + 1)
        {
            if (r.second >= 0) 
                *dest = r;
            r.first = i;                    
        }
        r.second = i;
    }
    *dest = r;
}

Usage:

std::set<int> set;
// insert items

typedef std::pair<int, int> Range;
std::vector<Range> ranges;

setToRanges(set.begin(), set.end(), std::back_inserter(ranges));

You should also consider using the term interval instead of range, because the latter in STL parlance means "any sequence of objects that can be accessed through iterators or pointers" (source).

Finally, you should probably take at look at the Boost Interval Arithmetic Library, which is currently under review for Boost inclusion.

No shrinkwrapped solution I'm afraid, but an alternative algorithm.

Store your items in a bitvector - O(n) if you know the maximum item at the start and preallocate the vector.

Translate that vector into a vector of transition point flags - exclusive-or the bitvector with a bitshifted version of itself. Slightly fiddly at the word boundaries, but still O(n). Logically, you get a new key at the old max + 1 (the transition back to zeros after all your keys are exhausted), so it's a good idea to allow for that in the preallocation of the vector.

Then, iterate through the bitvector finding the set bits. The first set bit indicates the start of a range, the second the end, the third the start of the next range and so on. The following bit-fiddling function (assuming 32 bit int) may be useful...

int Low_Bit_No (unsigned int p)
{
  if (p == 0)  return -1;  //  No bits set

  int           l_Result = 31;
  unsigned int  l_Range  = 0xffffffff;
  unsigned int  l_Mask   = 0x0000ffff;

  if (p & l_Mask)  {  l_Result -= 16;  }  else  {  l_Mask ^= l_Range;  }
  l_Range &= l_Mask;
  l_Mask  &= 0x00ff00ff;
  if (p & l_Mask)  {  l_Result -=  8;  }  else  {  l_Mask ^= l_Range;  }
  l_Range &= l_Mask;
  l_Mask  &= 0x0f0f0f0f;
  if (p & l_Mask)  {  l_Result -=  4;  }  else  {  l_Mask ^= l_Range;  }
  l_Range &= l_Mask;
  l_Mask  &= 0x33333333;
  if (p & l_Mask)  {  l_Result -=  2;  }  else  {  l_Mask ^= l_Range;  }
  l_Mask  &= 0x55555555;
  if (p & l_Mask)  {  l_Result -=  1;  }

  return l_Result;
}

I'd use adjacent_find with a predicate that defines "adjacency" as two elements that are not sequential. This solution doesn't depend on INT_MAX. Still feels kinda clunky.

bool notSequential(int a, int b) { return (a + 1) != b; }

void setToRanges(const std::set<int>& indices, std::vector<Range>& ranges)
{
  std::set<int>::iterator iter = indices.begin();
  std::set<int>::iterator end = indices.end();
  int first;
  while (iter != end)
  {
    first = *iter;
    iter = std::adjacent_find(iter, end, notSequential);
    if (iter != end)
    {
      ranges.push_back(std::make_pair(first, *iter));
      ++iter;
    }
  }
  ranges.push_back(std::make_pair(first, *--iter));
}

That tests against end more than necessary. adjacent_find can never return the last element of a list, so the incremented iterator will never be end and thus can still be dereferenced. It could be rewritten as:

void setToRanges(const std::set<int>& indices, std::vector<Range>& ranges)
{
  std::set<int>::iterator iter = indices.begin();
  std::set<int>::iterator end = indices.end();
  if (iter == end) return; // empty set has no ranges
  int first;
  while (true)
  {
    first = *iter;
    iter = std::adjacent_find(iter, end, notSequential);
    if (iter == end) break;
    ranges.push_back(std::make_pair(first, *iter++));
  }
  ranges.push_back(std::make_pair(first, *--iter));
}