开发者

Are there any Libraries/ Projects that convert any generic document type to HTML

开发者 https://www.devze.com 2023-01-22 20:14 出处:网络
Are there any projects out there trying to build converters for different file types -> HTML or Text. The document formats are the most common ones; they include PDF, DOC(X), XLS(X), PPT(X), PS, etc.

Are there any projects out there trying to build converters for different file types -> HTML or Text. The document formats are the most common ones; they include PDF, DOC(X), XLS(X), PPT(X), PS, etc. I am already aware of some Unix utilities like pdftotext. Also, I know of Apache's Tika and POI projects. Is there anything that has a generic interface ? Something like the MultiMarkdow开发者_运维问答n


Like you said, the philosophy of UNIX-like systems is to use small utilities/filters to do that (latex2html, t2html, txt2html, pdftohtml, etc.). You could create you own interface using shell scripting, perl, python, etc. and use those filters as callbacks.

0

精彩评论

暂无评论...
验证码 换一张
取 消