开发者

Need an easy way to display a word document in html

开发者 https://www.devze.com 2023-02-18 01:42 出处:网络
I\'m getting a bunch of .docs emailed to me which I\'m writing a python script to extract the body and any .doc or .pdf as well as any message they may have sent and depending on the answer it may do

I'm getting a bunch of .docs emailed to me which I'm writing a python script to extract the body and any .doc or .pdf as well as any message they may have sent and depending on the answer it may do more, and then I want to send it to my web server and have a php script format it for display.

I want to do any converting on my home pc because I don't have shell access to the web server and php is the only language supported which I (kind of) know. On the desktop I'm opened up to python, C, and C++ all of which I know better and are more suited for the job. I would really like to keep the formatting if possible, and I'm not trying to make a big project out of this so if it's too compli开发者_开发知识库cated I can always just upload the .doc and open it locally.


There are various Word to HTML converters - commercial and open source converters. The most common converter (open source) is likely "wv". You can also using Open-Office e.g. using the PyUNO bridge (requires a running OpenOffice server). If you are on Windows there are various commercial solutions available re-using an installed Office installation. In general: Google yourself and choose a converter according to your needs and requirements.


Leverage Google's power to turn everything into HTML: http://docs.google.com/viewer?pli=1 They even include a tiny API guide on how to use it on that page.


You can use our Doc To HTML Converter for this task. This application installs on your PC and converts many input MS Word documents at once in batch mode, employing MS Word to access their original content. The program although does not use (X)HTML generation engine built into MS Word, instead it uses its own implementation, tailored for producing compact clean code. It also does not require access to Internet to do the job.


Use antiword for MS Word content extraction.

http://www.winfield.demon.nl/

You can choose XML output format to preserve basic formatting. You may then use XSLT to do HTML formatting.

0

精彩评论

暂无评论...
验证码 换一张
取 消