I have a list of sets (a,b,c,d,e in below example). Each of the sets contains a list of nodes in that set (1-6 below). I was wondering that there probably is a general known algorithm for achieving the below, and I just do not know about it.
sets[
a[1,2,5,6],
b[1,4,5],
c[1,2,5],
d[2,5],
e[1,6],
]
I would like to generate a new structure, a list of groups, with each group having
- all the (sub)sets of nodes that appear in multiple sets
- references to the original sets those nodes belong to
So the above data would become (order of groups irrelevant).
group1{nodes[2,5],sets[a,c,e]}
group2{nodes[1,2,5],sets[a,c]}
group3{nodes[1,6],sets[a,e]}
group4{nodes[1,5],sets[a,b,c]}
I am assuming I can get the data in as an array/object structure and manipulate that, and then spit the resulting structure out in whatever format needed.
It would be a plus if:
- all groups had a minimum of 2 nodes and 2 sets.
- when a subset of nodes is contained in a bigger set that forms a group, then only the bigger set gets a group: in this example, nodes 1,2 do not have a group of their own since all the sets they have in common already appear in group2.
(The sets are stored in XML, which I have also managed to convert to JSON so far, but this is irrelevant. I can understand procedural (pseudo)code but also something like开发者_运维百科 a skeleton in XSLT or Scala could help to get started, I guess.)
- Go through the list of sets. For each set S
- Go through the list of groups. For each group G
- If S can be a member of G (i.e. if G's set is a subset of S), add S to G.
- If S cannot be a member of G but the intersection of S ang G's set contains more than one node, make a new group for that intersection and add it to the list.
- Give S a group of its own and add it to the list.
- Combine any groups that have the same set.
- Go through the list of groups. For each group G
- Delete any group with only one member set.
For example, with your example sets, after reading a and b the list of groups is
[1,2,5,6] [a] [1,5] [a,b] [1,4,5] [b]
And after reading c it's
[1,2,5,6] [a] [1,5] [a,b,c] [1,4,5] [b] [1,2,5] [a,c]
There are slightly more efficient algorithms, if speed is a problem.
/*
Pseudocode algorithm for creating groups data from a set dataset, further explained in the project documentation. This is based on
http://stackoverflow.com/questions/1644387/create-groups-from-sets-of-nodes
I am assuming
- Group is a structure (class) the objects of which contain two lists: a list of sets and a list of nodes (group.nodes). Its constructor accepts a list of nodes and a reference to a Set object
- Set is a list structure (class), the objects (set) of which contain the nodes of the list in set.nodes
- groups and sets are both list structures that can contain arbitrary objects which can be iterated with foreach().
- you can get the objects two lists have in common as a new list with intersection()
- you can count the number of objects in a list with length()
*/
//Create groups, going through the original sets
foreach(sets as set){
if(groups.nodes.length==0){
groups.addGroup(new Group(set.nodes, set));
}
else{
foreach (groups as group){
if(group.nodes.length() == intersection(group.nodes,set.nodes).length()){
// the group is a subset of the set, so just add the set as a member the group
group.addset(set);
if (group.nodes.length() < set.nodes.length()){
// if the set has more nodes than the group that already exists,
// create a new group for the nodes of the set, with set as a member of that group
groups.addGroup(new Group(set.nodes, set));
}
}
// If group is not a subset of set, and the intersection of the nodes of the group
// and the nodes of the set
// is greater than one (they have more than one person in common), create a new group with
// those nodes they have in common, with set as a member of that group
else if(group.nodes.length() > intersection(group.nodes,set.nodes).length()
&& intersection(group.nodes,set.nodes).length()>1){
groups.addGroup(new Group(intersection(group.nodes,set.nodes), set);
}
}
}
}
// Cleanup time!
foreach(groups as group){
//delete any group with only one member set (for it is not really a group then)
if (group.sets.length<2){
groups.remove(group);
}
// combine any groups that have the same set of nodes. Is this really needed?
foreach(groups2 as group2){
//if the size of the intersection of the groups is the same size as either of the
//groups, then the groups have the same nodes.
if (intersection(group.nodes,group2.nodes).length == group.nodes.length){
foreach(group2.sets as set2){
if(!group.hasset(set)){
group.addset(set2);
}
}
groups.remove(group2);
}
}
}
精彩评论