开发者

Which file types are worth compressing (zipping) for remote storage? For which of them the compressed size/original size ratio is << 1?

开发者 https://www.devze.com 2022-12-31 18:09 出处:网络
I am storing documents in sql server in varbinary(max) fileds, I use filestream optionally when a user has:

I am storing documents in sql server in varbinary(max) fileds, I use filestream optionally when a user has:

(DB_Size + Docs_Size) ~> 0.8 * ExpressEdition_Max_DB_Size

I am currently zipping all the files, anyway this is done because the Document Read/Write work was developed 10 years ago where Storage was more expensive than now.

Many files when zipped are almost as big as the original (a zipped pdf is about 95% of original size). And anyway unzipping has some overhead, that becomes twice when I need also to "Check-in"/Update the file because I need to zip it.

So I was thinking of giving to the use开发者_开发百科rs the option to choose whether the file type will be zipped or not by providing some meaningful default values. For my experience I would impose the following rules:

1) zip by default: txt, bmp, rtf

2) do not zip by default: jpg, jpeg, Microsoft Office files, Open Office files, png, tif, tiff

Could you suggest other file types chosen among the most common or comment on the ones I listed here?


.doc and .mdb files actually tend to compress rather well, if i remember correctly. The Office 2007 equivalents (.docx and .accdb), though, are zip files already...so compressing them is pretty much useless.

Don't forget HTML and XML files. Zip by default.


I commend you on being able to recognize what are and aren't compressed file types. You probably already understand this, but I'll rant here:

Do not double-up compression methods! Each compression method adds its own header adding to file size, and since the data has already had its statistical redundancies eliminated as best as it could by one method, it's probably not going to be able to compressed further via another method. Take this set of files for example:

46,494,380  level0.wav
43,209,258  level1.wav.zip
43,333,266  level2.wav.zip.rar
43,339,894  level3.wav.zip.rar.gz
43,533,989  level4.wav.zip.rar.gz.bz2

All of these files contain the same data.

The first compression method worked well to eliminate redundancies, but each successive compression method just added to the file size, not to mention the headache of decrypting the file later.

The best method of compression is usually the first one applied.

28,259,406  level1.wav.flac            <~ using a compression method meant for the file.
0

精彩评论

暂无评论...
验证码 换一张
取 消