I want to develop a soft keyboard for Android and already got a autocorrect algorithm which makes suggestions based on the fact if the input character and the character of a word from the dictionary are neighbours on the keyboard. This works in combination with the levenshtein-algorithm (if a character has to be replaced with a different character it is checked if they are neighbours). That's why this check is called very frequently. Currently, it consumes 50% of the time spent on autocorrection.
My current approach is a seperate trie with 3 layers. First lay开发者_StackOverflower: first character. Second layer: second character: Third layer: boolean holding the information if the characters are neighbours. But I'm afraid a trie is overkill? The intern hashmaps for every children may slow it down, as well? Should I build a hashmap with an own charToNumber-function?
How would you do this? Which bottlenecks can be avoided? Character.toLowerCase() seems to be inefficient as well when it's called everytime a check is performed.
I hope you can help me speeding up the task :)
You just want to determine whether two characters are next to each other on the keyboard? Why not use a map from a character to a set of adjacent characters? When using efficient data structures you will get O(1)
time - use array for a map (continuous key space - ASCII codes of keys) and BitSet for a set of adjacent keys. Also very compact.
Here is a sample code:
BitSet[] adjacentKeys = new BitSet[127];
//initialize
adjacentKeys[(int) 'q'] = new BitSet(127);
adjacentKeys[(int) 'q'].set((int)'a');
adjacentKeys[(int) 'q'].set((int)'w');
adjacentKeys[(int) 'q'].set((int)'s');
//...
//usage
adjacentKeys[(int) 'q'].get((int) 'a'); //q close to a yields true
adjacentKeys[(int) 'q'].get((int) 'e'); //q close to e yields false
This should be very efficient, no loops and complicated computations like hashCode
s. Of course you have to initialize the table manually, I would advice doing this once at application startup from som external configuration file.
BTW neat idea!
I really like the idea.
For raw speed, you would use a massive switch
statement. The code would be large, but there would be nothing faster:
public static boolean isNeighbour(char key1, char key2) {
switch (key1) {
case 'a':
return key2 == 'w' || key2 == 'e' || key2 == 'd' || key2 == 'x' || key2 == 'z';
case 'd':
return key2 == 's' || key2 == 'w' || key2 == 'f' || key2 == 'c' || key2 == 'x';
// etc
default:
return false;
}
}
Here's a "standard" way to do it that should still perform well:
private static final Map<Character, List<Character>> neighbours =
new HashMap<Character, List<Character>>() {{
put('s', Arrays.asList('a', 'w', 'e', 'd', 'x', 'z'));
put('d', Arrays.asList('s', 'e', 'w', 'f', 'c', 'x'));
// etc
}};
public static boolean isNeighbour(char key1, char key2) {
List<Character> list = neighbours.get(key1);
return list != null && list.contains(key2);
}
This algorithm does not make use of the fact that if a isneighbour b
then b isneighbour a
, but rather sacrifices data size for code simplicity.
What about assigning numbers to each key and use that to determine proximity.
public static void main(String[] args) {
double[] d = new double[26];
d['q'-97] = 100d;
d['w'-97] = 101d;
d['e'-97] = 102d;
d['r'-97] = 103d;
d['t'-97] = 104d;
//(optionally, put a space of 5 between right hand and left hand for each row)
d['y'-97] = 105d;
d['u'-97] = 106d;
d['i'-97] = 107d;
d['o'-97] = 108d;
d['p'-97] = 109d;
//my keyboard middle row is about 20% indented from first row
d['a'-97] = 200.2;
d['s'-97] = 201.2;
d['d'-97] = 202.2;
d['f'-97] = 203.2;
d['g'-97] = 204.2;
d['h'-97] = 205.2;
d['j'-97] = 206.2;
d['k'-97] = 207.2;
d['l'-97] = 208.2;
//third row is about 50% indented from middle row
d['z'-97] = 300.5;
d['x'-97] = 301.5;
d['c'-97] = 302.5;
d['v'-97] = 303.5;
d['b'-97] = 304.5;
d['n'-97] = 305.5;
d['m'-97] = 306.5;
for (char a = 'a'; a <= 'z'; a++) {
for (char b = 'a'; b <= 'z'; b++)
if (a != b && prox(a,b,d))
System.out.println(a + " and " + b + " are prox");
}
}
static boolean prox(char a, char b, double m) {
double a1 = m[a-97];
double a2 = m[b-97];
double d = Math.abs(a1-a2);
//TODO: add in d == 5 if there is a spacing for left and right hand gap (since it's more unlikely of a crossover)
return d == 0 || d == 1 || (d >= 99 && d <= 101);
}
Partial Output:
a and q are prox
a and s are prox
a and w are prox
a and z are prox
....
g and b are prox
g and f are prox
g and h are prox
g and t are prox
g and v are prox
g and y are prox
....
y and g are prox
y and h are prox
y and t are prox
y and u are prox
Here is my hungarian version (if somebody needs it):
public static boolean isHungarianNeighbour(int key1, int key2) {
switch (key1) {
case 'q':
return key2 == 'w' || key2 == 's' || key2 == 'a' || key2 == '1' || key2 == '2';
case 'w':
return key2 == 'q' || key2 == '2' || key2 == '3' || key2 == 'e' || key2 == 's' || key2 == 'a';
case 'e':
return key2 == '3' || key2 == '4' || key2 == 'w' || key2 == 'r' || key2 == 's' || key2 == 'd';
case 'r':
return key2 == '4' || key2 == '5' || key2 == 'e' || key2 == 't' || key2 == 'd'|| key2 == 'f';
case 't':
return key2 == '5' || key2 == '6' || key2 == 'r' || key2 == 'z' || key2 == 'f' || key2 == 'g';
case 'z':
return key2 == '6' || key2 == '7' || key2 == 't' || key2 == 'u' || key2 == 'g' || key2 == 'h';
case 'u':
return key2 == '7' || key2 == '8' || key2 == 'z' || key2 == 'i' || key2 == 'h' || key2 == 'j';
case 'i':
return key2 == '8' || key2 == '9' || key2 == 'u' || key2 == 'o' || key2 == 'j' || key2 == 'k';
case 'o':
return key2 == '9' || key2 == 'ö' || key2 == 'i' || key2 == 'p' || key2 == 'k' || key2 == 'l';
case 'p':
return key2 == 'ö' || key2 == 'ü' || key2 == 'o' || key2 == 'ő' || key2 == 'l' || key2 == 'é';
case 'ő':
return key2 == 'ü' || key2 == 'ó' || key2 == 'p' || key2 == 'ú' || key2 == 'é' || key2 == 'á';
case 'ú':
return key2 == 'ó' || key2 == 'ő' || key2 == 'á' || key2 == 'ű';
case 'a':
return key2 == 'q' || key2 == 'w' || key2 == 's' || key2 == 'y' || key2 == 'í';
case 's':
return key2 == 'w' || key2 == 'e' || key2 == 'a' || key2 == 'd' || key2 == 'y' || key2 == 'x';
case 'd':
return key2 == 'e' || key2 == 'r' || key2 == 's' || key2 == 'f' || key2 == 'x' || key2 == 'c';
case 'f':
return key2 == 'r' || key2 == 't' || key2 == 'd' || key2 == 'g' || key2 == 'c' || key2 == 'v';
case 'g':
return key2 == 't' || key2 == 'z' || key2 == 'f' || key2 == 'h' || key2 == 'v' || key2 == 'b';
case 'h':
return key2 == 'z' || key2 == 'u' || key2 == 'g' || key2 == 'j' || key2 == 'b' || key2 == 'n';
case 'j':
return key2 == 'u' || key2 == 'i' || key2 == 'h' || key2 == 'k' || key2 == 'n' || key2 == 'm';
case 'k':
return key2 == 'i' || key2 == 'o' || key2 == 'j' || key2 == 'l' || key2 == 'm';
case 'l':
return key2 == 'o' || key2 == 'p' || key2 == 'k' || key2 == 'é';
case 'é':
return key2 == 'p' || key2 == 'ő' || key2 == 'l' || key2 == 'á';
case 'á':
return key2 == 'ő' || key2 == 'ú' || key2 == 'é' || key2 == 'ű';
case 'ű':
return key2 == 'á' || key2 == 'ú';
case 'í':
return key2 == 'a' || key2 == 'y';
case 'y':
return key2 == 'a' || key2 == 's' || key2 == 'í' || key2 == 'x';
case 'x':
return key2 == 's' || key2 == 'd' || key2 == 'y' || key2 == 'c';
case 'c':
return key2 == 'd' || key2 == 'f' || key2 == 'x' || key2 == 'v';
case 'v':
return key2 == 'f' || key2 == 'g' || key2 == 'c' || key2 == 'b';
case 'b':
return key2 == 'g' || key2 == 'h' || key2 == 'v' || key2 == 'n';
case 'n':
return key2 == 'h' || key2 == 'j' || key2 == 'b' || key2 == 'm';
case 'm':
return key2 == 'j' || key2 == 'k' || key2 == 'n' || key2 == '?';
default:
return false;
}
}
精彩评论