开发者

Document search in Lucene/Solr, Whoosh, Sphinx, Xapian

开发者 https://www.devze.com 2023-03-20 17:30 出处:网络
I am comparing Lucene/Solr, Whoosh, Sphinx and Xapian for searching documents in DOC, DOCX, HTML and PDF. Only Solr is documented to have a document parser (Tika) which directly in开发者_StackOverflow

I am comparing Lucene/Solr, Whoosh, Sphinx and Xapian for searching documents in DOC, DOCX, HTML and PDF. Only Solr is documented to have a document parser (Tika) which directly in开发者_StackOverflow中文版dexes documents. So it seems a clear winner.

But to level the playing field, I like to consider the alternatives. Do the others have direct document indexing (which I may have missed)? If not are they can it be implemented easily? Or is Solr the overwhelming choice?


On Sphinx you're able to convert file using a PHP script through the xmlpipe_command option. Since PHP has a Tika-wrapper, writing the script and the setup itself aren't hard.

0

精彩评论

暂无评论...
验证码 换一张
取 消