I have been trying to research how solr works when documents like doc or pdf are submitted to it. I want to know if I submit pdfs to solr, does it end up storing the pdf file also along with the index generated after parsing the pdf file?开发者_开发知识库
Thanks,
-Keshav
Solr (Lucene) doesn't "end up store the PDF file" itself. However it can store the text contents of the PDF extracted from the PDF using a text-extractor such as Tika (if indeed the field is marked as stored in the schema).
If you wish to store the PDF file in its entirety you will need to convert the PDF into (for example) Base64 representation and persist the base64 string as a "Stored" field. So when you access the doc you convert back from Base64 to PDF.
精彩评论