开发者

parse pdf and identify page a phrase is on

开发者 https://www.devze.com 2022-12-15 14:59 出处:网络
I want to programmatically parse a pdf file, look for certain phrases and find out the page number that each phrase is on. Is this possible (I understand that开发者_开发问答 pdf is not like a text fil

I want to programmatically parse a pdf file, look for certain phrases and find out the page number that each phrase is on. Is this possible (I understand that开发者_开发问答 pdf is not like a text file)? Is so, are there libraries out there that can help?


Apache Tika, which you can find at the Apache Lucene project, includes PDFBox, which will pull out the text where you can work with it.

0

精彩评论

暂无评论...
验证码 换一张
取 消