开发者

perl doc/pdf/xls to HTML convertor

开发者 https://www.devze.com 2023-02-04 08:43 出处:网络
I would like to convert files with extensions doc/docx/xls/xlsx/pdf to HTML files. Is there any way to d开发者_如何学Co that in a simple way on Solaris using Perl?The perl libraries I\'ve used for pro

I would like to convert files with extensions doc/docx/xls/xlsx/pdf to HTML files. Is there any way to d开发者_如何学Co that in a simple way on Solaris using Perl?


The perl libraries I've used for processing Microsoft Office files have been pretty lacking, and I have yet to find ones that do a good job of handling the Office 2007 and Office 2010 extensions (please point to one in the comments if you know of one!)

If you have a PC running Microsoft Office, you can use win32ole to control the Office app from unix. I've done it before with Ruby: http://rubyonwindows.blogspot.com/2007/03/automating-excel-with-ruby.html

Here's a perl module for using win32 OLE: http://metacpan.org/pod/Win32::OLE

I personally don't recommend the OLE approach because it has lots of headaches (like you have to leave Office running on the PC for the unix script to work, Windows Firewall will almost randomly block the unix script as your PC gets updated with patches).

I haven't tried this, but here's a java program that will use OpenOffice and GhostScript to do batch conversions for you: http://www.codeproject.com/KB/java/PDFCM.aspx


As a sidenote, there is a utility called xpdf which converts pdf files to text. That has been compiled on Solaris, though you'd have to compile from source (you can call the utility from the command line). I've used it and it's great.

More importantly, there is a modified version of it which converts pdf to html. This one I have not tested out, but it might be worth a try.


for excel to html -> you could use exceltohtml

needs the following modules :

use Spreadsheet::ParseExcel;
use File::Find ; use Cwd ; 
0

精彩评论

暂无评论...
验证码 换一张
取 消