I've a jsp web application with a custom search engine.
The search engine is basically build on top of a 'documents' table of a SQL Server database.
To exemplify, each document record has three fields:
- document id
- 'descripion' (text field)
- 'attachment', a path of a pdf file in the filesystem.
The search engine actually searches keywords in description field and returns a result list in an HTML page. Now I want to search keywords even in the pdf file content.
I'm investigating about Lucene, Tika, Solr, but I don't understand how I can use these frameworks for my goal.
One possible solution: using Tika to extract pdf content and sto开发者_开发问答re in a new document table field, so I can write SQL queries on this field.
Are there better alternatives? Can I use Solr/Lucene indexing features as an integration of SQL-based search engine and not as a totally substitute of it?
Thanks
I would consider Lucene to be completely independent of an SQL Database, i.e. you will not use SQL/jdbc/whatever DB to query Lucene, but its own API and its own data store.
You could of course use Tika to extract the full text of a pdf, store it, and use whatever your SQL DB provides re. fulltext search capacity.
If you are using Hibernate, Hibernate Search is a fantastic product which integrates both an SQL store and Lucene. But you would have to go the Hibernate/JPA way, which might be overkill for your project.
精彩评论