This is a follow-up question to Combinatorics in Python

I have a tree or directed acyclic graph if you will with a structure as:

Python Combinatorics, part 2

Where r are root nodes, p are parent nodes, c are child nodes and b are hypothetical branches. The root nodes are not directly linked to the parent nodes, it is only a reference.

I am intressted in finding all the combinations of branches under the constraints:

A child can be shared by any number of parent nodes given that these parent nodes do not share root node.
A valid combination should not be a subset of another combination

In this example only two valid comb开发者_如何学Pythoninations are possible under the constraints:

combo[0] = [b[0], b[1], b[2], b[3]]
combo[1] = [b[0], b[1], b[2], b[4]]

The data structure is such as b is a list of branch objects, which have properties r, c and p, e.g.:

b[3].r = 1
b[3].p = 3
b[3].c = 2

This problem can be solved in Python easily and elegantly, because there is a module called "itertools".

Lets say we have objects of type HypotheticalBranch, which have attributes r, p and c. Just as you described it in your post:

class HypotheticalBranch(object):
  def __init__(self, r, p, c):
    self.r=r
    self.p=p
    self.c=c
  def __repr__(self):
    return "HypotheticalBranch(%d,%d,%d)" % (self.r,self.p,self.c)

Your set of hypothetical branches is thus

b=[ HypotheticalBranch(0,0,0),
  HypotheticalBranch(0,1,1),
  HypotheticalBranch(1,2,1),
  HypotheticalBranch(1,3,2),
  HypotheticalBranch(1,4,2) ]

The magical function that returns a list of all possible branch combos could be written like so:

import collections, itertools

def get_combos(branches):
  rc=collections.defaultdict(list)
  for b in branches:
    rc[b.r,b.c].append(b)
  return itertools.product(*rc.values())

To be precise, this function returns an iterator. Get the list by iterating over it. These four lines of code will print out all possible combos:

for combo in get_combos(b):
  print "Combo:"
  for branch in combo:
    print "  %r" % (branch,)

The output of this programme is:

Combo:
  HypotheticalBranch(0,1,1)
  HypotheticalBranch(1,3,2)
  HypotheticalBranch(0,0,0)
  HypotheticalBranch(1,2,1)
Combo:
  HypotheticalBranch(0,1,1)
  HypotheticalBranch(1,4,2)
  HypotheticalBranch(0,0,0)
  HypotheticalBranch(1,2,1)

...which is just what you wanted.

So what does the script do? It creates a list of all hypothetical branches for each combination (root node, child node). And then it yields the product of these lists, i.e. all possible combinations of one item from each of the lists.

I hope I got what you actually wanted.

You second constraint means you want maximal combinations, i.e. all the combinations with the length equal to the largest combination.

I would approach this by first traversing the "b" structure and creating a structure, named "c", to store all branches coming to each child node and categorized by the root node that comes to it.

Then to construct combinations for output, for each child you can include one entry from each root set that is not empty. The order (execution time) of the algorithm will be the order of the output, which is the best you can get.

For example, your "c" structure, will look like:

c[i][j] = [b_k0, ...]  
--> means c_i has b_k0, ... as branches that connect to root r_j)

For the example you provided:

c[0][0] = [0]
c[0][1] = []
c[1][0] = [1]
c[1][1] = [2]
c[2][0] = []
c[2][1] = [3, 4]

It should be fairly easy to code it using this approach. You just need to iterate over all branches "b" and fill the data structure for "c". Then write a small recursive function that goes through all items inside "c".

Here is the code (I entered your sample data at the top for testing sake):

class Branch:
  def __init__(self, r, p, c):
    self.r = r
    self.p = p
    self.c = c

b = [
    Branch(0, 0, 0),
    Branch(0, 1, 1),
    Branch(1, 2, 1),
    Branch(1, 3, 2),
    Branch(1, 4, 2)
    ]

total_b = 5   # Number of branches
total_c = 3   # Number of child nodes
total_r = 2   # Number of roots

c = []
for i in range(total_c):
  c.append([])
  for j in range(total_r):
    c[i].append([])

for k in range(total_b):
  c[b[k].c][b[k].r].append(k)

combos = []
def list_combos(n_c, n_r, curr):
  if n_c == total_c:
    combos.append(curr)
  elif n_r == total_r:
    list_combos(n_c+1, 0, curr)
  elif c[n_c][n_r]:
      for k in c[n_c][n_r]:
        list_combos(n_c, n_r+1, curr + [b[k]])
  else:
    list_combos(n_c, n_r+1, curr)

list_combos(0, 0, [])

print combos

There are really two problems here: firstly, you need to work out the algorithm that you will use to solve this problem and secondly, you need to implement it (in Python).

Algorithm

I shall assume you want a maximal collection of branches; that is, once to which you can't add any more branches. If you don't, you can consider all subsets of a maximal collection.

Therefore, for a child node we want to take as many branches as possible, subject to the constraint that no two parent nodes share a root. In other words, from each child you may have at most one edge in the neighbourhood of each root node. This seems to suggest that you want to iterate first over the children, then over the (neighbourhoods of the) root nodes, and finally over the edges between these. This concept gives the following pseudocode:

for each child node:
    for each root node:
        remember each permissible edge

find all combinations of permissible edges

Code

>>> import networkx as nx
>>> import itertools
>>> 
>>> G = nx.DiGraph()
>>> G.add_nodes_from(["r0", "r1", "p0", "p1", "p2", "p3", "p4", "c0", "c1", "c2"])
>>> G.add_edges_from([("r0", "p0"), ("r0", "p1"), ("r1", "p2"), ("r1", "p3"),
...                   ("r1", "p4"), ("p0", "c0"), ("p1", "c1"), ("p2", "c1"),
...                   ("p3", "c2"), ("p4", "c2")])
>>> 
>>> combs = set()
>>> leaves = [node for node in G if not G.out_degree(node)]
>>> roots = [node for node in G if not G.in_degree(node)]
>>> for leaf in leaves:
...     for root in roots:
...         possibilities = tuple(edge for edge in G.in_edges_iter(leaf)
...                               if G.has_edge(root, edge[0]))
...         if possibilities: combs.add(possibilities)
... 
>>> combs
set([(('p1', 'c1'),), 
     (('p2', 'c1'),), 
     (('p3', 'c2'), ('p4', 'c2')), 
     (('p0', 'c0'),)])
>>> print list(itertools.product(*combs))
[(('p1', 'c1'), ('p2', 'c1'), ('p3', 'c2'), ('p0', 'c0')), 
 (('p1', 'c1'), ('p2', 'c1'), ('p4', 'c2'), ('p0', 'c0'))]

The above seems to work, although I haven't tested it.

For each child c, with hypothetical parents p(c), with roots r(p(c)), choose exactly one parent p from p(c) for each root r in r(p(c)) (such that r is the root of p) and include b in the combination where b connects p to c (assuming there is only one such b, meaning it's not a multigraph). The number of combinations will be the product of the numbers of parents by which each child is hypothetically connected to each root. In other words, the size of the set of combinations will be equal to the product of the hypothetical connections of all child-root pairs. In your example all such child-root pairs have only one path, except r1-c2, which has two paths, thus the size of the set of combinations is two.

This satisfies the constraint of no combination being a subset of another because by choosing exactly one parent for each root of each child, we maximize the number connections. Subsequently adding any edge b would cause its root to be connected to its child twice, which is not allowed. And since we are choosing exactly one, all combinations will be exactly the same length.

Implementing this choice recursively will yield the desired combinations.