开发者

Prune Down Autocomplete Terms

开发者 https://www.devze.com 2022-12-14 11:14 出处:网络
I have a very large list of terms for use in an autocomplete box. I\'ve been mulling over a few different scenarios for how to prune them down, but I haven\'开发者_高级运维t come up with anything grea

I have a very large list of terms for use in an autocomplete box. I've been mulling over a few different scenarios for how to prune them down, but I haven'开发者_高级运维t come up with anything great yet.

Basically, the structure is very similar to a record label -

  • An artist has albums An album has songs
  • Individual songs could be popular, albums are mostly sums of their underlying song popularity
  • Albums also have highly variable number of songs in them - so if an album has hundreds of song, it's very likely that someone would want to search for it, and much less likely if it has less songs
  • As the autocomplete becomes more specific (more letters), I'd like less likely terms to be shown

I'm thinking something like this:

Apple   10
Banana  10
Crab    20
Diner   30
Dish    20
Daily   10
Diver   20
Dice    10

If this is the list of albums, and the "score" i assign them, I simply pop choose the list based on the length of the list I'm showing (3 for example) and then by score - I hit "D" above, and "Diner", "Dish" and "Diver" show up, and then "i" and it becomes "Diner", "Dish" and "Diver".

Is there a particular algorithm that does this? Or an AJAX autocompleter built for this? I'm currently using Prototype/Scriptaculous but I can't seem to get it right.


I just posted a server-side autocomplete implementation on Google Code. The project includes a java library that can be integrated into existing applications and a standalone HTTP AJAX autocomplete server. Kick the tires!


This is not an easy algorithm to implement, since you are trying to index a data structure in two ways - lexicographically and by popularity.

One way to do it might be to build a compressed trie of the songs, where at each node you store a pre-built list of the N most popular songs beginning with that prefix. This would take a lot of storage (O(NUM_SONGS * N)), but would allow fast lookup (O(PREFIX)).


You could give the closure autocomplete a try.

0

精彩评论

暂无评论...
验证码 换一张
取 消