I'm looking for some documentation on how Information Retrieval systems (e.g., Lucene) store their indexes for speedy "relevancy" lookups. My Google-fu is failing me: I've found a page which describes Lucene's file format, but it's m开发者_如何学Goore focused on how many bits each number is than on how the database is used in producing speedy queries.
Surely someone has some useful bookmarks lying around that they can refer me to.
Thanks!
The Lucene index is an inverted index, so any search on this topic should be relevant, like:
- http://en.wikipedia.org/wiki/Inverted_index
- http://www.ibm.com/developerworks/library/wa-lucene/
精彩评论