I have problem in optimizing following psedo code any help is appreciated
for every term
open new index searcher
do search
if found
skip and search for next term
else
add it to index
commit
close searc开发者_C百科her
In the above code while adding new doc/term to index, I have to commit the changes for just adding a new doc( which I feel costly) to see new changes opening new index searcher next time.
Is there any way I can improve the performance. FYI: I have 36 million terms to be indexed.
You can create a HashSet to de-duplicate your list of terms in memory, then index just those terms. The pseudocode is like so:
set := new HashSet for each term if set contains term skip to next iteration else add term to set end open index for each term in set add term to index end close index
I suggest you simply create a second index (either in a RAMDirectory or a FSDirectory on a temporary location). Add all those terms/documents that have not been found to the second (temporary) index and merge the two indices at the end.
open index for searching
for every term
open new index searcher
do search
if found
skip and search for next term
else
add it to the second index
end
close searcher
commit temp index
merge temp index into primary index
commit primary index
精彩评论