开发者

Pairwise alignment of long string sequences

开发者 https://www.devze.com 2023-03-06 10:41 出处:网络
I want to find the globally optimal (or close to optimal) pairwise alignment between two long (tens of thousands) sequences of strings, but the algorithm is expected to operate on any object sequences

I want to find the globally optimal (or close to optimal) pairwise alignment between two long (tens of thousands) sequences of strings, but the algorithm is expected to operate on any object sequences. I also want to use my own distance function implementation to compute the similarity of two objects. For shorter sequences, I could use the dynamic time warping (DTW) algorithm but the DTW algorithm needs to compute and store a n*m distance matrix (n,m are lengths of the sequences) which is not feasible for longer sequences. Can y开发者_如何学JAVAou recommend such algorithm? A working implementation would be a plus.

The following example clarifies what the algorithm needs to do:

Input:
Sequence A: i saw the top of the mountain
Sequence B: then i see top of the mountains

Result:    
Sequence A:      i saw the top of the mountain
Sequence B: then i see     top of the mountains


I don't know, if I understood your requirements right, but it sounds to me like you are trying to solve a Stable Marriage Problem. The original solution of Gale and Shapley in the link is in O(n*m) time and O(n+m) space, if my memory serves me right. The implementation was pretty straightforward. There are also some later solutions with different variants of the problem.

You also could try to solve this problem by using Maximum Bipartite Graph Matching, e.g. here, for getting a different optimality criterion. This also can be done in O(n*m).

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号