开发者

Levenshtein to Damerau-Levenshtein

开发者 https://www.devze.com 2023-03-06 07:43 出处:网络
I\'m sitting here and programmnging some algorithms for my main program in Java (well the first one so far). I programmed the levenshtein algorithm just fine thanks to wiki being so nice with the pseu

I'm sitting here and programmnging some algorithms for my main program in Java (well the first one so far). I programmed the levenshtein algorithm just fine thanks to wiki being so nice with the pseudocode to newbeginners plus a nice tutorial :D

I then decided to upgrade to Damerau and added the extra lines but then I read that it's not DL algo but OptimalStringAlignmentDistance instead. I tried reading the actionscript code to understand what more I needed to add to make it to DL but got confused instead. I've been to different places with code looking alike to Java but they all use the wrong pseudocode too.

After spending half the day I gave up and decided to ask here. Is there anyone who can help me with upgrading this code to开发者_Go百科 Damerau-Levenshtein in Java?

    public class LevensteinDistance {
        private static int Minimum(int a, int b, int c) {
            return Math.min(Math.min(a, b), c);
        }

        private static int Minimum (int a, int b) {
            return Math.min(a, b);
        }

        public static int computeLevensteinDistance(String s, String t){
            int d[][];
            int n; // length of s
            int m; // length of t
            int i; // iterates through s
            int j; // iterates through t
            char s_i; // ith character of s
            char t_j; // jth character of t
            int cost; // cost

            n = s.length ();
            m = t.length ();
            if (n == 0) {
                return m;
            }
            if (m == 0) {
                return n;
            }
            d = new int[n+1][m+1];

            for (i = 0; i <= n; i++) {
                d[i][0] = i;
            }

            for (j = 0; j <= m; j++) {
                d[0][j] = j;
            }

            for(i = 1; i <= n; i++) {
                s_i = s.charAt (i - 1);
                for(j = 1; j <= m; j++) {
                    t_j = t.charAt (j - 1);

                    if(s_i == t_j){
                        cost = 0;
                    }else{
                        cost = 1;
                    }
                    d[i][j] = Minimum(d[i-1][j]+1, d[i][j-1]+1, d[i-1][j-1] + cost);

                    if(i > 1 && j > 1 && s_i == t_j-1 && s_i-1 == t_j){ 
                        d[i][j] = Minimum(d[i][j], d[i-2][j-2] + cost);
                    }
                }
            }
        return d[n][m];
    }

    //    public static void main(String[] args0){
    //      String a = "I decided it was best to ask the forum if I was doing it right";
    //      String b = "I thought I should ask the forum if I was doing it right";
    //      System.out.println(computeLevensteinDistance(a, b));
    //    }
}

Here is the Wikipedia page for the Damerau–Levenshtein distance algorithm


Your problem is in referencing the previous characters from the string in your conditional. In your original code you have:

if(i > 1 && j > 1 && s_i == t_j-1 && s_i-1 == t_j){ 
  d[i][j] = Minimum(d[i][j], d[i-2][j-2] + cost);
}

The problem is the values t_j-1 and s_i-1. These say the ith character of s and t minus 1, where the algorithm says you want the (ith minus 1) characters. For example if string s is "AFW" and i is 1 then

s_i - 1 = E; //the character value (s[1]='F') minus 1 = 'E'
s.charAt(i-1) = A; //i-1 = 0, s[0] = 'A'

so your conditional should read:

if(i > 1 && j > 1 && s_i == t.charAt(j-1) && s.charAt(i-1) == t_j) { 
  d[i][j] = Minimum(d[i][j], d[i-2][j-2] + cost);
}

EDIT: Unforutnately I do not understand the algorithm from reading the code, however here is a translation of the ActionScript example from the wikipedia page in Java, which gives an output that matches your example:

public static int damerauLevenshteinDistance(
      String a, String b, int alphabetLength) {
    final int INFINITY = a.length() + b.length();
    int[][] H = new int[a.length()+2][b.length()+2];  
    H[0][0] = INFINITY;
    for(int i = 0; i<=a.length(); i++) {
      H[i+1][1] = i;
      H[i+1][0] = INFINITY;
    }
    for(int j = 0; j<=b.length(); j++) {
      H[1][j+1] = j;
      H[0][j+1] = INFINITY;
    }      
    int[] DA = new int[alphabetLength];
    Arrays.fill(DA, 0);
    for(int i = 1; i<=a.length(); i++) {
      int DB = 0;
      for(int j = 1; j<=b.length(); j++) {
        int i1 = DA[b.charAt(j-1)];
        int j1 = DB;
        int d = ((a.charAt(i-1)==b.charAt(j-1))?0:1);
        if(d==0) DB = j;
        H[i+1][j+1] =
          min(H[i][j]+d,
              H[i+1][j] + 1,
              H[i][j+1]+1, 
              H[i1][j1] + (i-i1-1) + 1 + (j-j1-1));
      }
      DA[a.charAt(i-1)] = i;
    }
    return H[a.length()+1][b.length()+1];
  }

  private static int min(int ... nums) {
    int min = Integer.MAX_VALUE;
    for (int num : nums) {
      min = Math.min(min, num);
    }
    return min;
  }


I think a SparseArray can be used for DA, that way it is not necessary to know the exact size of the alphabet.

SparseArray<Integer> DA = new SparseArray<Integer>();
  ...
    int i1 = DA.get(b.charAt(j - 1), 0);
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号