开发者

How to save a web page snapshot with all its elements (css, js, images, ...) into one file

开发者 https://www.devze.com 2023-02-23 13:25 出处:网络
How is it possible to programmatically save a web page snapshot with all its elements (css, js, images, ...) into one file?

How is it possible to programmatically save a web page snapshot with all its elements (css, js, images, ...) into one file?

I need to archive some web pages regularly. However, just saving their HTML code is useless - not only because of images missing but esp. because the absence of CSS on today's pages can turn a web page into unrecognizable mess.

I remember the .mht format that worke开发者_如何转开发d like this, but that required manual saving, and it was just a feature of IE. I believe there is an open-source solution that can achieve this programmatically, but despite hours of searching I cannot find it on the web.


HTTrack, -%M


Use wget in terminal

wget -p -k http://www.example.com/

It'll make a clone of site frontend html, css, js, svg etc. But not in one file as asked. Rather, it'll recreate the whole folder structure

E.g. if folder structure of www.example.com is as

 /css/*
 /js/*
 /index.html

then it'll create the same structure locally.

Docs: https://www.gnu.org/software/wget/manual/wget.html


I think @reisio (+1) has you covered...

...But if only to plug a great free tool, I would point out the Firefox extension Save Complete, which does an admirable job of grabbing "complete" pages on an ad hoc basis. The output will be a single HTML file with an accompanying directory stuffed with all the resources - you can easily zip them up for archiving.

It's not without fault - I've had issues with corrupted .png files lately on OSX, but I use it frequently for building mockups off of live pages and it's a huge time-saver. (Also of note, it hasn't been updated for FF 4 yet, and is the sole reason I rolled back to 3.6)


If you are using Google Chrome just use the save page as menu entry (CTRL + s), and select complete website from the options at the bottom of the file dialog. This save the HTML and all required resources (in a separate folder).


Apple's Safari has a pretty good solution. It saves all HTML and CSS (sadly no JS) but in a format called webarchive. It's one file, but it requires Safari to save and open, and Safari requires a Mac. Even though Safari for Windows does exist, it's too old to work with webpages, and it doesn't even support saving as webarchive, or opening them. If you have a Mac, open any website in Safari and press ⌘S and then make sure that Web Archive appears in the drop down.

There is also a Chrome extension that can open these types of files, but not save them.

Apologies for replying to such an old thread, just wanted to spread this info!

0

精彩评论

暂无评论...
验证码 换一张
取 消