I have a Rails application that accepts file uploads of arbitrary business documents such as from Word, Excel, Powerpoint, and PDF. I need to make 开发者_JAVA技巧all these documents searchable, preferably using Sphinx or PostgreSQL full text search. What are the best solutions?
As pointed out in the comments, this is covered pretty well by an older question.
In short: you're going to have to store the relevant extracted data from those files in the database for Sphinx, and likely for PostgreSQL full-text search as well. Sphinx can now also understand plain text files (as long as a database column points to a file), but that will still involve another tool extracting data from PDF, DOC, XLS et al.
精彩评论