I am working on a project to convert OCR'd PDf to png using ImageMagick and ghostscript and display in the browser so that i can select words in the image by letting a user query for the word . Imagemagick works fine along with ghostscript .
I have a problem with the ps2text utility where it does not work reliably with pdf's . could anybody suggest a good utility to convert postscript 开发者_Python百科to text in Linux so that i can store it in a db . thereafter i use a custom written search class to find out the co-ordinates of each word and highlight the text in the browser .
Thanks
For postscript, you should use ps2text. For PDFs, you can pdftotext.
精彩评论