I need to compare 2 sequences and find an edit distance. Edits can include deletion and insertion operations (with modification weight 1 per symbol), and block move operations (with weight 0.1 per symbol)
For example: A B C D E F G H F G H A B C Y D X E Block FGH was moved here. Is there any existing alg开发者_运维知识库orithm to solve this task efficiently?You might try A technique for isolating differences between files (via here):
An algorithm which uses the 'move' operator is described in P. Heckel's 1978 paper
(Sorry for the scribd interface, but I guess the paper has not been OCR'd.)
The abstract for Block Edit Models for Approximate String Matching looks promising.
Yes; there are many algorithms and theory relating to biology; genome alignment and chromosome rearrangement. Without knowing your data, it's really hard to mention something more specific. I mention pancake sorting as a measure of rearrangement on another stackoverflow post, there are also a few other great options (compression, specifically). Of course that method will not be able to break apart your data into blocks. Dealing with small sequence data you should have no problem generating all groupings.
精彩评论