开发者

Binary Delta Storage

开发者 https://www.devze.com 2023-04-01 09:39 出处:网络
I\'m looking for a binary delta storage solution to version large binary files (digital audio workstation files)

I'm looking for a binary delta storage solution to version large binary files (digital audio workstation files)

When working with DAW files, the majority of changes, especially near the end of the mix are very small in comparison to the huge amount of data used to store raw data (waves).

It would be great to have a versioning system for our DAW files, allowing us to roll back to older versions.

The system would only save the difference between the binary files (diff) of each version. This would give us a list of instructions to change from the current version to the previous version wit开发者_高级运维hout storing the full file for every single version.

Is there any current versioning systems that do this? I've read that SVN using binary diff's to save space in the repo... But I've also read that it doesn't actually do that for binary files only text files... Not sure. Any ideas?

My plan of action as of right now is to continue research into preexisiting tools, and if none exist, become comfortable with c/c++ reading binary data and creating the tool myself.


I can't comment on the reliability or connection issues that might exist when committing a large file across the network (one referenced post hinted at problems). But here is a little bit of empirical data that you may find useful (or not).

I have been doing some tests today studying disk seek times and so had a reasonably good test case readily at hand. I found your question interesting, so I did a quick test with the files I am using/modifying. I created a local Subversion repository and added two binary files to it (sizes shown below) and then committed the files a couple of times after changes were made to them. The smaller binary file (.85 GB) simply had data added to the end of it each time. The larger file (2.2GB) contains data representing b-trees consisting of "random" integer data. The updates to that file between commits involved adding approximately 4000 new random values, so it would have modified nodes spread somewhat evenly throughout the file.

Here are the original file sizes along with the size/count of all files in the local subversion repository after the commit:

file1    851,271,675  
file2  2,205,798,400 

1,892,512,437 bytes in 32 files and 32 dirs

After the second commit:

file1    851,287,155  
file2  2,207,569,920  

1,894,211,472 bytes in 34 files and 32 dirs

After the third commit:

file1    851,308,845  
file2  2,210,174,976  

1,897,510,389 bytes in 36 files and 32 dirs

The commits were somewhat lengthy. I didn't pay close attention because I was doing other work, but I think each one took maybe 10 minutes. To check out a specific revision took about 5 minutes. I would not make a recommendation one way or other based on my results. All I can say is that it seemed to work fine and no errors occurred. And the file differencing seemed to work well (for these files).


Subversion might work, depending on your definition of large. This question/answer says that it works well as long as your files are less than 1 GB.


Subversion will perform binary deltas on binary files as well as text files. Subversion is just incapable of providing human-readable deltas for binary files, and cannot assist with merging conflicts in binary files.


git compresses (you may need to call git gc manually though), and seemingly really good:

$ git init
$ dd if=/dev/urandom of=largefile bs=1M count=100
$ git add largefile
$ git commit -m 'first commit'
[master (root-commit) e474841] first commit
 1 files changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 largefile
$ du -sh .
201M    .
$ for i in $(seq 20); do date >> largefile; git commit -m "$i" -a; git gc; done
$ du -sh .
201M    .
0

精彩评论

暂无评论...
验证码 换一张
取 消