开发者

Double-Array Trie Question

开发者 https://www.devze.com 2023-03-05 10:29 出处:网络
I am trying to understand Double-Array Trie implementation from http://linux.thai.net/~thep/datrie/datrie.html

I am trying to understand Double-Array Trie implementation from http://linux.thai.net/~thep/datrie/datrie.html But I do not understand following part.

check[base[s] + c] = s
base开发者_开发技巧[s] + c = t 

What does adding c means here.

Can anybody explain algorithm in simple language.


Imagine you collaps a 2D array into a 1D array:

int arr2D[2][3];
int arr1D[2 * 3]; // # of rows * # of columns

// access element [1][2] of 2D array, i.e., the last element
int elem2D = arr2D[1][2];
// access element [1][2] of 1D array, i.e., the last element
int elem1D = arr1D[1 * 3 + 2];
// =========================================================
lets visualize the array access:
arr2D => x/y 0  1  2
          0 [N][N][N]
+1 on x > 1 [N][N][N]
+2 on y ---------- ^
y_len  =>   ^-------^ 3 elements
so the access happens with x * y_len + y
                           1 *   3   + 2
now for the 1D array
we want the second row, so we go with 1 * 3
(0-based access, y_len = 3)
and then we want the 3rd element, so + 2
(again, 0-based access)
arr1D =>  x  0  1  2 
            [N][N][N]
             3  4  5
1 * 3 = 3 > [N][N][N]
      + 2 = 5 ---- ^

I hope I didn't make this too complicated (even though I think I did...). :)


Characters are "zero based", i.e., 'a' is 0, 'b' is 1, 'c' is 2, et cetera. Looking up "a" would mean to add that 0 to the base address of the current node and check the owner at that index. If that owner is the current node, there is a string starting with "a" from the current node in the trie, and the base address at that index is the base address for the next lookup.


You should edit your code to the following:

check[base[s] + c] = s
next[base[s] + c] = t 

In the scenario that state s accepts a character c and transformed into state t: s - c -> t. The key point is what base[s] means. As a state may accept multiple characters:

| |c1|c2|c3| |--|--|--|--| | s| x| y| z| The problem is how you express these mappings of s-c1->x, s-c2->y, s-c3->z. Traditionally, a map is a good practice:

class State {
    Map<Character, State> mappings;
}

But what if you use an array to express the mappings? base[s] is a base index in next array, from which an offset c is added. base[s] + c is the final index in next to store the next state mapped from state s and character c. Note that character c is actually an integer in programming language.

0

精彩评论

暂无评论...
验证码 换一张
取 消