开发者

Indexing PDF files with Symfony using Lucene

开发者 https://www.devze.com 2022-12-21 03:10 出处:网络
I am a Symfony developer and my web server is Linux. I already use the sfLucene plugin. What is the simplest way of indexing PDF files for search on a Linux PHP server?

I am a Symfony developer and my web server is Linux. I already use the sfLucene plugin.

What is the simplest way of indexing PDF files for search on a Linux PHP server?

  1. XPDF, installed like this
  2. Apache 开发者_如何学CTika via the SOLR sfLucene plugin branch
  3. A 3rd option?

Thanks!


Coming from a Zend background, i generally recommend using Zend_Search_Lucene. The XPDF example is really straight forward and looks simple. XPDF is licenced as GPL - if that fits your need, go for #1!

ZF can easily be integrated within your Symfony projects, e.g. for a Twitter Call.


There are many libraries for extracting text content from PDF. With any of these, you then need to create a lucene document with the content. The most useful ones will be those that already have lucene integration.

Apache PDFBox can create a lucene document directly from PDF file. It will include PDF metadata fields as well as text content.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号