开发者

How to integrate database search with pdf search in a web app?

开发者 https://www.devze.com 2023-03-06 05:49 出处:网络
I\'ve a jsp web application with a custom search engine. The search engine is basically build on top of a \'documents\' table of a SQL Server database.

I've a jsp web application with a custom search engine.

The search engine is basically build on top of a 'documents' table of a SQL Server database.

To exemplify, each document record has three fields:

  • document id
  • 'descripion' (text field)
  • 'attachment', a path of a pdf file in the filesystem.

The search engine actually searches keywords in description field and returns a result list in an HTML page. Now I want to search keywords even in the pdf file content.

I'm investigating about Lucene, Tika, Solr, but I don't understand how I can use these frameworks for my goal.

One possible solution: using Tika to extract pdf content and sto开发者_开发问答re in a new document table field, so I can write SQL queries on this field.

Are there better alternatives? Can I use Solr/Lucene indexing features as an integration of SQL-based search engine and not as a totally substitute of it?

Thanks


I would consider Lucene to be completely independent of an SQL Database, i.e. you will not use SQL/jdbc/whatever DB to query Lucene, but its own API and its own data store.

You could of course use Tika to extract the full text of a pdf, store it, and use whatever your SQL DB provides re. fulltext search capacity.

If you are using Hibernate, Hibernate Search is a fantastic product which integrates both an SQL store and Lucene. But you would have to go the Hibernate/JPA way, which might be overkill for your project.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号