开发者

Data structure behind T9 type of dictionary

开发者 https://www.devze.com 2022-12-25 08:24 出处:网络
How does a T9 dictionary work? What is the data structure behind it. If we type \'4663\' we get \'good\' when we press down button we get \'gone\' then \'home\' etc...开发者_如何学运维

How does a T9 dictionary work? What is the data structure behind it. If we type '4663' we get 'good' when we press down button we get 'gone' then 'home' etc...开发者_如何学运维

EDIT: If the user types in 46 then it should show 'go' and when pressed down arrow should show 'gone' etc...


It can be implemented in several ways, one of them is Trie. The route is represented by the digits and the nodes point to collection of words.

It can be implemented using nested hash tables as well, the key of the hash table is a letter and on every digit the algorithm calculates all possible routes (O(3^n) routes).


4663

translates to

{G,H,I}{M,N,O}{M,N,O}{D,E,F}

T9 works by filtering the possibilities down sequentially starting with the first possible letters. So the first step in your example will be to filter the dictionary list to all words beginning with G, H, or I. Next step, take that list and filter the second letters by M, N, O. And so on...


I guess, as those before that T9 uses a trie, where the links are represented by a bitmap (1 bit per letter). This is called a succinct data structure, as Steve Hanov explains very nicely:

http://stevehanov.ca/blog/index.php?id=120


I think that T9 is using a bit-map-based Trie. On the first level there is a 32-bit (I assume they expanded to 32) word where each bit represents a letter. All letters that are valid as start letters for a word have their bit set.

In the next level there is one 32-bit map for each of those bits that were set in the previous level. In these maps each bit is set if the corresponding letter is valid as a second letter in a word, with the starting letter decided by the first level.

The structure then continues. Using one-bit-per-letter makes a very efficient storage. The addressing scheme however must be challenging. There might also be some kind of optimization using the above schema for short words while using something else for longer words.


C# implementation using Trie

        public interface ICellT9
        {
            void Add(string a_name);

            List<string> GetNames(string a_number);
        }

        public class Cell : ICellT9
        {
            private Dictionary<int, Cell> m_nodeHolder;
            private List<string> m_nameList;

            public Cell()
            {
                m_nameList = new List<string>();
                m_nodeHolder = new Dictionary<int, Cell>();

                for (int i = 2; i < 10; i++)
                {
                    m_nodeHolder.Add(i, null);
                }
            }

            public void Add(string a_name)
            {
                Add(a_name, a_name);
            }

            private void Add(string a_name, string a_originalName)
            {
                if (string.IsNullOrEmpty(a_name) && 
                   (string.IsNullOrEmpty(a_originalName) == false))
                {
                    m_nameList.Add(a_originalName);
                }
                else
                {
                    int l_firstNumber = CharToNumber(a_name[0].ToString());

                    if (m_nodeHolder[l_firstNumber] == null)
                    {
                        m_nodeHolder[l_firstNumber] = new Cell();
                    }

                    m_nodeHolder[l_firstNumber].Add(a_name.Remove(0, 1), a_originalName);
                }
            }

            public List<string> GetNames(string a_number)
            {
                List<string> l_result = null;

                if (string.IsNullOrEmpty(a_number))
                {
                    return l_result;
                }

                int l_firstNumber = a_number[0] - '0';

                if (a_number.Length == 1)
                {
                    l_result = m_nodeHolder[l_firstNumber].m_nameList;
                }
                else if(m_nodeHolder[l_firstNumber] != null)
                {
                    l_result = m_nodeHolder[l_firstNumber].GetNames(a_number.Remove(0, 1));
                }

                return l_result;

            }

            private int CharToNumber(string c)
            {
                int l_result = 0;

                if (c == "a" || c == "b" || c == "c")
                {
                    l_result = 2;
                }
                else if (c == "d" || c == "e" || c == "f")
                {
                    l_result = 3;
                }
                else if (c == "g" || c == "h" || c == "i")
                {
                    l_result = 4;
                }
                else if (c == "j" || c == "k" || c == "l")
                {
                    l_result = 5;
                }
                else if (c == "m" || c == "n" || c == "o")
                {
                    l_result = 6;
                }
                else if (c == "p" || c == "q" || c == "r" || c == "s")
                {
                    l_result = 7;
                }
                else if (c == "t" || c == "u" || c == "v")
                {
                    l_result = 8;
                }
                else if (c == "w" || c == "x" || c == "y" || c == "z")
                {
                    l_result = 9;
                }

                return l_result;
            }
        }


I'd probably do it by starting with a dictionary, then (for each word) push the word onto a list keyed by the numbers that could form the word.

Then, you'll probably need to sort the resulting lists in some fashion, so more likely words appear before less likely words (if you have the space, I'd also include a small area for a counter, so we can incrementally re-sort these OR simply mov ethe last used to the front of the suggestion list, so we, over time, tend towards giving the user a better answer).

That way, when you have 4663 as an input, you can quite simply retrieve the relevant linked list wit ha roughly O(1) hash table lookup.

0

精彩评论

暂无评论...
验证码 换一张
取 消