开发者

Implementing Disjoint Sets (Union Find) in C++

开发者 https://www.devze.com 2023-01-31 19:47 出处:网络
I am trying to implement Disjoint Sets for use in Kruskal\'s algorithm, but I am having trouble understanding exactly how it should be done and in particular, how to manage the forest of trees. After

I am trying to implement Disjoint Sets for use in Kruskal's algorithm, but I am having trouble understanding exactly how it should be done and in particular, how to manage the forest of trees. After reading the Wikipedia description of Disjoint Sets and after reading the description in Introduction to Algorithms (Cormen et al) I have come up with the following:

    class DisjointSet
    {
    public:
        class   Node 
        {
        public:
            int         data;
            int         rank;

            Node*       parent;

         开发者_运维技巧   Node() : data(0), 
                     rank(0),
                     parent(this) { } // the constructor does MakeSet
        };

        Node*   find(Node*);
        Node*   merge(Node*, Node*); // Union
    };


    DisjointSet::Node*   DisjointSet::find(DisjointSet::Node* n)
    {
        if (n != n->parent) {
            n->parent = find(n->parent);
        }
        return n->parent;
    }

    DisjointSet::Node* DisjointSet::merge(DisjointSet::Node* x,
                                          DisjointSet::Node* y)
    {
        x = find(x);
        y = find(y);

        if (x->rank > y->rank) {
            y->parent = x;
        } else {
            x->parent = y;
            if (x->rank == y->rank) {
                ++(y->rank);
            }
        }
    }

I am pretty sure this is incomplete though and that I am missing something.

Introduction to Algorithms mentions that there should be a forest of trees, but it does not give any explanation for a practical implementation of this forest. I watched CS 61B Lecture 31: Disjoint Sets ( http://www.youtube.com/watch?v=wSPAjGfDl7Q ) and here the lecturer uses only an array to store both the forest and all its trees and values. There is no explicit 'Node' type of class as I have, mentioned. I have also found many other sources (I cannot post more than one link), which also use this technique. I would be happy to do this, except that this relies on the indices of the array for lookup and since I want to store values of type other than int, I need to use something else (std::map comes to mind).

Another issue that I am unsure about is memory allocation because I am using C++. I am storing trees of pointers and my MakeSet operation will be: new DisjointSet::Node; . Now, these Nodes only have pointers to their parents, so I'm not sure how to find the bottom of a tree. How will I be able to traverse my trees to deallocate them all?

I understand the basic concept of this data structure, but I'm just a bit confused about the implementation. Any advice and suggestions would be most welcome, thank you.


Not a perfect implementation by any means (I did write it after all!), but does this help?

/***
 * millipede: DisjointSetForest.h
 * Copyright Stuart Golodetz, 2009. All rights reserved.
 ***/

#ifndef H_MILLIPEDE_DISJOINTSETFOREST
#define H_MILLIPEDE_DISJOINTSETFOREST

#include <map>

#include <common/exceptions/Exception.h>
#include <common/io/util/OSSWrapper.h>
#include <common/util/NullType.h>

namespace mp {

/**
@brief  A disjoint set forest is a fairly standard data structure used to represent the partition of
        a set of elements into disjoint sets in such a way that common operations such as merging two
        sets together are computationally efficient.

This implementation uses the well-known union-by-rank and path compression optimizations, which together
yield an amortised complexity for key operations of O(a(n)), where a is the (extremely slow-growing)
inverse of the Ackermann function.

The implementation also allows clients to attach arbitrary data to each element, which can be useful for
some algorithms.

@tparam T   The type of data to attach to each element (arbitrary)
*/
template <typename T = NullType>
class DisjointSetForest
{
    //#################### NESTED CLASSES ####################
private:
    struct Element
    {
        T m_value;
        int m_parent;
        int m_rank;

        Element(const T& value, int parent)
        :   m_value(value), m_parent(parent), m_rank(0)
        {}
    };

    //#################### PRIVATE VARIABLES ####################
private:
    mutable std::map<int,Element> m_elements;
    int m_setCount;

    //#################### CONSTRUCTORS ####################
public:
    /**
    @brief  Constructs an empty disjoint set forest.
    */
    DisjointSetForest()
    :   m_setCount(0)
    {}

    /**
    @brief  Constructs a disjoint set forest from an initial set of elements and their associated values.

    @param[in]  initialElements     A map from the initial elements to their associated values
    */
    explicit DisjointSetForest(const std::map<int,T>& initialElements)
    :   m_setCount(0)
    {
        add_elements(initialElements);
    }

    //#################### PUBLIC METHODS ####################
public:
    /**
    @brief  Adds a single element x (and its associated value) to the disjoint set forest.

    @param[in]  x       The index of the element
    @param[in]  value   The value to initially associate with the element
    @pre
        -   x must not already be in the disjoint set forest
    */
    void add_element(int x, const T& value = T())
    {
        m_elements.insert(std::make_pair(x, Element(value, x)));
        ++m_setCount;
    }

    /**
    @brief  Adds multiple elements (and their associated values) to the disjoint set forest.

    @param[in]  elements    A map from the elements to add to their associated values
    @pre
        -   None of the elements to be added must already be in the disjoint set forest
    */
    void add_elements(const std::map<int,T>& elements)
    {
        for(typename std::map<int,T>::const_iterator it=elements.begin(), iend=elements.end(); it!=iend; ++it)
        {
            m_elements.insert(std::make_pair(it->first, Element(it->second, it->first)));
        }
        m_setCount += elements.size();
    }

    /**
    @brief  Returns the number of elements in the disjoint set forest.

    @return As described
    */
    int element_count() const
    {
        return static_cast<int>(m_elements.size());
    }

    /**
    @brief  Finds the index of the root element of the tree containing x in the disjoint set forest.

    @param[in]  x   The element whose set to determine
    @pre
        -   x must be an element in the disjoint set forest
    @throw Exception
        -   If the precondition is violated
    @return As described
    */
    int find_set(int x) const
    {
        Element& element = get_element(x);
        int& parent = element.m_parent;
        if(parent != x)
        {
            parent = find_set(parent);
        }
        return parent;
    }

    /**
    @brief  Returns the current number of disjoint sets in the forest (i.e. the current number of trees).

    @return As described
    */
    int set_count() const
    {
        return m_setCount;
    }

    /**
    @brief  Merges the disjoint sets containing elements x and y.

    If both elements are already in the same disjoint set, this is a no-op.

    @param[in]  x   The first element
    @param[in]  y   The second element
    @pre
        -   Both x and y must be elements in the disjoint set forest
    @throw Exception
        -   If the precondition is violated
    */
    void union_sets(int x, int y)
    {
        int setX = find_set(x);
        int setY = find_set(y);
        if(setX != setY) link(setX, setY);
    }

    /**
    @brief  Returns the value associated with element x.

    @param[in]  x   The element whose value to return
    @pre
        -   x must be an element in the disjoint set forest
    @throw Exception
        -   If the precondition is violated
    @return As described
    */
    T& value_of(int x)
    {
        return get_element(x).m_value;
    }

    /**
    @brief  Returns the value associated with element x.

    @param[in]  x   The element whose value to return
    @pre
        -   x must be an element in the disjoint set forest
    @throw Exception
        -   If the precondition is violated
    @return As described
    */
    const T& value_of(int x) const
    {
        return get_element(x).m_value;
    }

    //#################### PRIVATE METHODS ####################
private:
    Element& get_element(int x) const
    {
        typename std::map<int,Element>::iterator it = m_elements.find(x);
        if(it != m_elements.end()) return it->second;
        else throw Exception(OSSWrapper() << "No such element: " << x);
    }

    void link(int x, int y)
    {
        Element& elementX = get_element(x);
        Element& elementY = get_element(y);
        int& rankX = elementX.m_rank;
        int& rankY = elementY.m_rank;
        if(rankX > rankY)
        {
            elementY.m_parent = x;
        }
        else
        {
            elementX.m_parent = y;
            if(rankX == rankY) ++rankY;
        }
        --m_setCount;
    }
};

}

#endif


Boost has a disjoint set implementation:

http://www.boost.org/doc/libs/1_54_0/libs/disjoint_sets/disjoint_sets.html


your implementation looks good (except in the function merge you should either declare returning void or put a return there, I prefer to return void). The thing is you need to track the Nodes*. You can do that by having a vector<DisjointSet::Node*> on your DisjointSet class or having this vector somewhere else and declaring the methods of DisjointSet as static.

Here is an example of run (note that I changed merge to return void and didn't change the methods of DisjointSet to be static:

int main()
{
    vector<DisjointSet::Node*> v(10);
    DisjointSet ds;

    for (int i = 0; i < 10; ++i) {
        v[i] = new DisjointSet::Node();
        v[i]->data = i;
    }

    int x, y;

    while (cin >> x >> y) {
        ds.merge(v[x], v[y]);
    }

    for (int i = 0; i < 10; ++i) {
        cout << v[i]->data << ' ' << v[i]->parent->data << endl;
    }

    return 0;
}

With this input:

3 1
1 2
2 4
0 7
8 9

It prints the expected:

0 7
1 1
2 1
3 1
4 1
5 5
6 6
7 7
8 9
9 9

Your forest is the composition of the trees:

   7    1       5    6   9
 /    / | \              |
0    2  3  4             8

So you algorithm is good, have the best complexity for Union-find as far as I know and you keep track of your Nodes on your vector. So you could simply deallocate:

for (int i = 0; i < int(v.size()); ++i) {
    delete v[i];
}


I can't speak about the algorithm, but for memory management, typically you will use something called a smart pointer, which will free what it points to. You can get shared ownership and single ownership smart pointers, and non-ownership too. Correct use of these will guarantee no memory issues.


Your implementation is fine. All you now need to do is to keep an array of disjoint set Nodes so that you can call your union/find methods on them.

For Kruskal's algorithm, you'll need an array containing one disjoint-set Node per graph vertex. Then, when you look up the next edge to add to your subgraph, you would use the find method to check if those nodes are both in your subgraph already. If they are, then you can move on to the next edge. Otherwise, it's time to add that edge to your subgraph and perform a union operation between the two vertices connected by that edge.


This blog article shows a C++ implementation using path compression: http://toughprogramming.blogspot.com/2013/04/implementing-disjoint-sets-in-c.html


For implementing Disjoint Sets from scratch, I highly recommend reading the book of Data Structures & Algorithm Analysis in C++ by Mark A. Weiss.

In Chap 8, it starts from basic find / union, then it gradually adds union by height / depth / rank, and find compression. In the end, it provides Big-O analysis.

Trust me, it has everything you want to know about Disjoint Sets and its C++ implementation.


The following code seems simple to understand for implementing union-find disjoints sets by path compression

int find(int i)
{
    if(parent[i]==i)
    return i;
    else
    return parent[i]=find(parent[i]);
}
void union(int a,int b)
{
    x=find(a);y=find(b);
        if(x!=y)
        {
            if(rank[x]>rank[y])
            parent[y]=x;
            else
            {
            parent[x]=y;
            if(rank[x]==rank[y])
            rank[y]+=1;             
            }
        }
}


Have a look at this code

class Node {
    int id,rank,data;
    Node *parent;

public :

    Node(int id,int data) {
        this->id = id;
        this->data = data;
        this->rank =0;
        this->parent = this;
    }

    friend class DisjointSet;
};

class DisjointSet {
    unordered_map<int,Node*> forest;

    Node *find_set_helper(Node *aNode) {
        if( aNode->parent != aNode)
            aNode->parent = find_set_helper(aNode->parent);

        return aNode->parent;
    }

    void link(Node *xNode,Node *yNode) {
        if( xNode->rank > yNode->rank)
            yNode->parent = xNode;
        else if(xNode-> rank < yNode->rank)
            xNode->parent = yNode;
        else {
            xNode->parent = yNode;
            yNode->rank++;
        }
    }

public:
    DisjointSet() {

    }

    void make_set(int id,int data) {
        Node *aNode = new Node(id,data);
        this->forest.insert(make_pair(id,aNode));
    }

    void Union(int xId, int yId) {
        Node *xNode = find_set(xId);
        Node *yNode = find_set(yId);

        if(xNode && yNode)
            link(xNode,yNode);
    }

    Node* find_set(int id) {
        unordered_map<int,Node*> :: iterator itr = this->forest.find(id);
        if(itr == this->forest.end())
            return NULL;

        return this->find_set_helper(itr->second);
    }

    void print() {
        unordered_map<int,Node*>::iterator itr;
        for(itr = forest.begin(); itr != forest.end(); itr++) {
            cout<<"\nid : "<<itr->second->id<<"  parent :"<<itr->second->parent->id;
        }
    }

    ~DisjointSet(){
        unordered_map<int,Node*>::iterator itr;
        for(itr = forest.begin(); itr != forest.end(); itr++) {
            delete (itr->second);
        }
    }

};


If you are trying to ask which style is better to imlement disjoint set (vector or map(rb tree) ) , then i may have something to add

  1. make_set (int key , node info ) : this is generally a member function to (1) add a node and (2) make node point to itself ( parent = key) , this intially creates a disjoint set. operation time complexity for vector O(n) , for map O(n*logn) .
  2. find_set( int key ) : this has two fuctions generally , (1) finding the node through given key (2) path compression . I couldnt really calculate it for path compression , but for simply searching for the node , the time complexity for (1) vector O(1) and for (2) map O(log(n)) .

Concludingly , i would like to say that although looking over here , vector implementation looks better, time complexities of both are O(M*α(n)) ≈ O(M*5) or so i have read .

ps. do verify what i have written though i am pretty sure it's correct .

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号