Kruskal's algorithm and disjoint-set data structure: Do I need the following two lines of code?_问答_开发者

Kruskal's algorithm and disjoint-set data structure: Do I need the following two lines of code?

开发者 https://www.devze.com 2023-02-20 11:45 出处：网络

I\'ve implemented Kruskal\'s algorithm in C++ using the disjoint-set data structure according to Wikipedia like this:

I've implemented Kruskal's algorithm in C++ using the disjoint-set data structure according to Wikipedia like this:

#include <stdio.h>
#include <algorithm>
#define MAX_EDGES 10000000
#define MAX_VERTICES 200001
using namespace std;
int num_edges,num_vertices;
int total_cost=0;

struct edge{
    int v1,v2;
    int cost;
};

struct comp{
    bool operator()(const edge& e1,const edge& e2){
        retu开发者_如何学JAVArn e1.cost<e2.cost;
    }
};

edge edges[MAX_EDGES];
int parent[MAX_VERTICES];
int rank[MAX_VERTICES];

int findset(int x){
    if(x!=parent[x]){
        parent[x]=findset(parent[x]);
    }
    return parent[x];
}

void merge(int x,int y){
    int px=findset(x),py=findset(y);
    if(rank[px]>rank[py]){
        parent[py]=px;
    }else{
        parent[px]=py;
    }
    if(rank[px]==rank[py]){
        ++rank[py];
    }
}

int main(){
    FILE* in=fopen("input","r");
    FILE* out=fopen("output","w");
    fscanf(in,"%d %d\n",&num_vertices,&num_edges);
    for(int i=1;i<=num_vertices;++i){
        parent[i]=i;
        rank[i]=0;
    }
    for(int i=0;i<num_edges;++i){
        fscanf(in,"%d %d %d\n",&edges[i].v1,&edges[i].v2,&edges[i].cost);
    }
    sort(edges,edges+num_edges,comp());
    for(int i=0;i<num_edges;++i){
        int s1=findset(edges[i].v1),s2=findset(edges[i].v2);
        if(s1!=s2){
            merge(s1,s2);
            total_cost+=edges[i].cost;
        }
    }
    fprintf(out,"%d\n",total_cost);
}

My question is: Do I need these two lines of code? If so, what's their importance?

int px=findset(x),py=findset(y); in merge instead of int px=parent[x],py=parent[y];
parent[x]=findset(parent[x]); in findset instead of return findset(parent[x]);

1) findset(x) returns the canonical representative of the set that x is in (the root of its ancestry tree). You need this to be able to compare whether two elements are in the same set or not (they have the same representative), parent[x] just returns the parent of x in the tree, which may not be the root.

1a) You forgot to test for px and py being identical in merge.

2) It's an optimization so that future calls to findset will run faster. If parent[x] used to point to its parent which pointed to the root of its set's tree, after this call parent[x] will point directly to the root.

You need this to because x.parent is not necessarily the representative of the class to which x belongs, so the algorithm wouldn't be correct without it.
Without the assignment, the algorithm would be suboptimal. This is the path compression optimization, also discussed in the Wikipedia.