Recently I have got an requirement like i need to convert pdf files to xml.So please tell me how to do it in pyth开发者_C百科on or ruby programming languages. please reply to my question as early as possible your help would be appreciated..
Use pdftohtml.
It can be installed with brew install pdftohtml
. This adds pdftohtml
to your path.
So, to convert pdf to xml, you can run pdftohtml -xml your_file.pdf your_file.xml
In python, you could use these lines:
import os
cmd = 'pdftohtml -xml your_file.pdf your_file'
os.system(cmd)
!note that pdftohtml automatically adds the 'xml' extension to your second file input
精彩评论