I have a very large list of terms for use in an autocomplete box. I've been mulling over a few different scenarios for how to prune them down, but I haven'开发者_高级运维t come up with anything great yet.
Basically, the structure is very similar to a record label -
- An artist has albums An album has songs
- Individual songs could be popular, albums are mostly sums of their underlying song popularity
- Albums also have highly variable number of songs in them - so if an album has hundreds of song, it's very likely that someone would want to search for it, and much less likely if it has less songs
- As the autocomplete becomes more specific (more letters), I'd like less likely terms to be shown
I'm thinking something like this:
Apple 10
Banana 10
Crab 20
Diner 30
Dish 20
Daily 10
Diver 20
Dice 10
If this is the list of albums, and the "score" i assign them, I simply pop choose the list based on the length of the list I'm showing (3 for example) and then by score - I hit "D" above, and "Diner", "Dish" and "Diver" show up, and then "i" and it becomes "Diner", "Dish" and "Diver".
Is there a particular algorithm that does this? Or an AJAX autocompleter built for this? I'm currently using Prototype/Scriptaculous but I can't seem to get it right.
I just posted a server-side autocomplete implementation on Google Code. The project includes a java library that can be integrated into existing applications and a standalone HTTP AJAX autocomplete server. Kick the tires!
This is not an easy algorithm to implement, since you are trying to index a data structure in two ways - lexicographically and by popularity.
One way to do it might be to build a compressed trie of the songs, where at each node you store a pre-built list of the N most popular songs beginning with that prefix. This would take a lot of storage (O(NUM_SONGS * N)), but would allow fast lookup (O(PREFIX)).
You could give the closure autocomplete a try.
精彩评论