开发者

Best way to support wildcard search on a large dictionary?

开发者 https://www.devze.com 2022-12-20 17:53 出处:网络
I am working on a project to search in a large dictionary (100k~1m words). The dictionary items look like {key,value,freq}. Myy task is the development of an i开发者_JAVA技巧ncremental search algoritm

I am working on a project to search in a large dictionary (100k~1m words). The dictionary items look like {key,value,freq}. Myy task is the development of an i开发者_JAVA技巧ncremental search algoritm to support exact match, prefix match and wildcard match. The results should be ordered by freq.

For example: the dictionary looks like

key1=a,value1=v1,freq1=4
key2=ab,value2=v2,freq2=2
key3=abc,value3=v3 freq3=1
key4=abcd,value4=v4,freq4=3

when a user types 'a', return v1,v4,v2,v3

when a user types 'a?c', return v4,v3

Now my best choice is a suffix tree represented by DAWG data struct, but this method does not support wildcard matches effectively.

Any suggestion?


You need to look at n-grams for indexing your content. If you want to something Out-of-the box, you might want to look at Apache Solr which does a lot of the hard work for you. It also supports prefix, wildcard queries etc.

0

精彩评论

暂无评论...
验证码 换一张
取 消