I have a huge word doc 20,000 words long and I would like to upload it to my blog.
However I would like to break it up i开发者_Go百科nto small(ish) webpages and if possible auto generate relevant keywords, title and description tags. Couldnt find a tool to do this so I'm thinking of coding something however I really have no idea where to begin. I write php/sql. I'm thinking of breaking it up every X characters then building the meta tags out of the most frequently occuring words. Which would be pretty easy but it also has quite a few images. Is there some php library I could use to manipulate word docs?
OpenOffice has the ability to churn Word dox into X/HTML/XML/other formats.
A while ago I wrote a PHP script that took the resulting XHTML output from large Word docs and performed XSL transformations on then - including HTMLTidy - and pump them into custom-built XHTML templates.
The result, surprisingly, was very good - with one caveat. Depending on the extent to which your Word docs have been edited - esp. with Track Change - you may find the occasional character drops out entirely, and you often get extra spacing.
In my case the output was legal in nature, so I had our edit team scour the output and give me an honest opinion, and to be honest they didn't feel good about the missing characters but imo a browser-based spellchecker would have picked up most of that.
So - my solution for you is to use Open Office to convert to XHTML (I believe I had to alter the conversion macro - there was a very simple typo in there that made it choke, from memory - it may have been fixed). And then have your way with the output however you please.
Check my profile and email me if you want the script I wrote and I'll mail you the source tomorrow if you like (its hacky but it works!).
EDIT: Many other solutions were tried, I forget the details, except that they all sucked a lot more than Open Office.
精彩评论