My client wants to change the look of his website. The content and the pages' location will remain the same, even the src to the images in the articles. Only the design will change. The design has been decided on and also, an static html page has also been created.
I want a tool to do the following:
- Download all the pages in the website (all the pages are .html pages)
- take the html part of the article inside the pages and put them in a templ开发者_Go百科ate provided by me.
- Write them into an output directory in my machine.
I just want the html pages, no need to download images or css or javascript.
Any idea?
I don't think you will find a tool that can do that. Maybe a perl (or similar) script that downloads all pages (wget) and then parses looking for certain table/css-class regex to identify where the content of an article is located. If all files have a similar and well structured format that should be no problem. Then your script writes that content into another well formated file (your template) to a specific position identified by some 'div class="article"'.
Yeah a tool for this mite be hard to find. But if all the pages have the same format you could use a strip_tags and find and replace to remove the html and anything you dont want. that will give you just the article string to re-Write to you new template.
精彩评论