I'm working with biological data 开发者_高级运维- namely groups of genes. For example:
group 1: geneA geneB geneC
group 2: geneD geneE
group 3: geneF geneG geneH
For each pair of genes, geneX
and geneY
I have a score telling how similiar the two genes are (actually, I have two scores, since I used BLAST which is 'directional': I first searched geneX
against all the other genes then geneY
against all the other genes, so I have two geneX--geneY
scores, but I guess I can take the lower score of the two, or the average).
So, let's suppose I have only one score for each pair of genes. My data can be viewed as a undirected graph:
and recall each edge has a score attached to it.
Now, what I would like to do is:
Visualize my data interactively: being able to click on gene nodes and open a link attached to them, show only edges above/below some threshold, control how the network is "spread", etc.
Cluster together groups which are similar, i.e. groups that have similar genes.
Any ideas of how can I do that? I guess it's basic clustering and I would appreciate any hints on packages/software that can be of any help here.
Thank you.
You'll probably get better responses if you ask this over at BioStar, the bioinformatics stackexchange. Specifically, many of the answers in this thread might be relevant:
Which is the best software to represent biological pathways in a directed graph (network) ?
You can try cluto. You will have to transform your triples (gene_1, gene_2, similarity) into a matrix and use 'scluster'.
精彩评论