开发者

Is there a library similar to PyCogent, but in Java (or Scala)?

开发者 https://www.devze.com 2023-02-05 17:48 出处:网络
I\'m writing a biological evolution simulator.Currently, all of my code is written in Python.For the most part, this is great and everything works sufficiently well.However, there are two steps in the

I'm writing a biological evolution simulator. Currently, all of my code is written in Python. For the most part, this is great and everything works sufficiently well. However, there are two steps in the process which take a long time and which I'd like to rewrite in Scala.

The first problem area is sequence evolution. Imagine you're given a phylogenetic tree which relates a large set of proteins. The length of each branch represents the evolutionary distance between the parent and child. The root of the tree is seeded with a single sequence, and then an ev开发者_C百科olutionary model (e.g. http://en.wikipedia.org/wiki/Models_of_DNA_evolution) is used to evolve the sequence along the tree structure; taking into account the branch lengths. PyCogent takes a long time to perform this step, and I believe that a reasonable Java/Scala implementation would be significantly faster. Do you know of any libraries that implement this type of functionality. I want to write the application in Scala, so, due to interoperability, any Java library will suffice.

The second problem area is the comparison of the generated sequences. The problem is, given a set of sequences for the proteins in a number of different extant species, attempt to use the sequence to reconstruct the phylogenetic tree which relates the species. This problem is inherently computationally demanding, because one must basically do a pairwise comparison between all sequences in the extant species. Here again, however, I feel like a Java/Scala implementation would perform significantly faster than a Python one, if for nothing else than the unfortunately slow speed of looping in Python. This part I could write from scratch more easily than the sequence evolution part, but I'd be willing to use a library for it as well if a good one exists.

Thanks, Rob


For the second problem, why not make use an existing program for comparing sequences and infering phylogenetic trees, like RAxML or MrBayes and call that? Maximum likelihood and Bayesian inference are very sophisticated models for these problems, and using them seems a far better idea than implementing it yourself - something like a maximum parsiomony or a neihbour-joining tree, which probably could be written from scratch for such a project, is not sufficient for evolutionary analysis. Unless you just want a very quick and dirty topology (and trees inferred via MP or NJ are really often quite false), where you can probably use something like this

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号