开发者

Backing up a DB with Git - a good idea?

开发者 https://www.devze.com 2023-03-01 03:01 出处:网络
The way I see it dumping a PostgeSQL DB into one big SQL file and then committing and pushing to a remote Git repo can be a terrific backup solution: I get a history of all v开发者_StackOverflowersion

The way I see it dumping a PostgeSQL DB into one big SQL file and then committing and pushing to a remote Git repo can be a terrific backup solution: I get a history of all v开发者_StackOverflowersions, hashing, secure transport, one-way (really hard to mess up and delete data by pushing), efficient storage (assuming no binaries) and no chance of a new image corrupting the backup (which is the risk with rsync).

Has anybody used this approach, especially with pg, and can share his/her experience? Pitfalls?


Here is the full script details on how to do this for postgres.

Create a Backup User

The scripts presume the existence of a user called 'backup' that has access to either all (superuser) or the specific database. The credentials are stored in the .pgpass file in the home directory. That file looks like this (presuming the password is 'secret').

~/.pgpass

*:*:*:backup:secret

Make sure you set the correct security on .pgpass or it will be ignored

chmod 0600 ~/.pgpass

Backup a Single Database

This dumps a specific database.

backup.sh

pg_dump dbname -U backup > backup.sql
git add .
git commit -m "backup"
git push origin master

Note: you probably don't want to use any file splitting options for the DB dump since any insertion/deletion will cause a 'domino' effect and change all files creating more deltas/changes in git.

Backup all Databases on this machine

This script will dump the entire database cluster (all databases):

pg_dumpall -U backup > backup.sql
git add .
git commit -m "backup"
git push origin master

Note: you probably don't want to use any file splitting options for the DB dump since any insertion/deletion will cause a 'domino' effect and change all files creating more deltas/changes in git.

Schedule it to Run

The final step is to add this to a cron job. So, 'crontab -e' and then add something like the following (runs every day at midnight)

# m h  dom mon dow   command
# run postgres backup to git
0 0 * * * /home/ubuntu/backupdbtogit/backup.sh

Restore

If you need to restore the database, you'll checkout the version you want to restore then pass to pg. (more details on that here http://www.postgresql.org/docs/8.1/static/backup.html#BACKUP-DUMP-RESTORE )

for a single database:

psql dbname < infile    

for the entire cluster

psql -f infile postgres

None of this was particularly complicated, but it's always tedious looking up all the parts.


Crashing on Server with limited RAM

I experienced an issue with git failing on a push. This was due to git using a lot of memory - several commits had backed up. I resolved the failure by mounting the server git repo on my local machine (which has plenty of RAM). I mounted the server disk using sshfs and then committed from my workstation machine. After I did this, the low memory server resumed commits without a problem.

A better alternative is to limit the memory usage of git during the pack (from Is there a way to limit the amount of memory that "git gc" uses? ).

git config --global pack.windowMemory "100m"
git config --global pack.packSizeLimit "100m"
git config --global pack.threads "1"

Note: I have not tried setting a memory limit yet, since I have not had the push failure problem again.


Generally, you ought to use a backup tool for doing backups, and a version control tool to do version control. They are similar, but not the same.

Some people mix the two, where for example essentially whatever is in the database is the version, and that doesn't have to be wrong, but be clear about what you want.

If you're talking about just the schema, then you probably can't do much wrong with "backups" using Git. But if you want to back up the data, then things can get complicated. Git isn't very good with large files. You could use something like git-annex to address that, but then you need a separate backup mechanism to create the external files. Also, using "proper" backup methods such as pg_dump or WAL archiving give other advantages, such as being able to restore subsets of databases or doing point-in-time recovery.

You probably also want to back up other parts of an operating system. How do you do that? Preferrably not with a version control system, because they don't preserve file permissions, timestamps, and special files so well. So it would make some sense to tie your database backup into your existing backup system.


I would definitely recommend it. People have been doing it as well, mainly around MySQL, but I don't think there is much of a difference:

http://www.viget.com/extend/backup-your-database-in-git/

Another approach is using ZFS snapshots for backups.

http://www.makingitscale.com/2010/using-zfs-for-fast-mysql-database-backups.html


I did this in $day_job, but it's with MySQL.

I had to write a script to chunkify the monolithic mysqldump file into individual files so that that I can get nice diff reports and also because git deals with small files better.

The script splits the monolithic sql file into individual sql table schemas and data.

I also had to ensure each sql insert statements are not on the same line in order to have readable diff reports.

One advantage of keeping the dump in git is that I can run "git log --stat" to get an overview of which tables changed between revisions of the "backup".


try this tool, make backups, you can download your backups, you only configure the server and the repository, you can also add users and privileges, you can send commits (changes to your database from the platform) and make backups by table or database to the just as it sends the changes to the complete database or only to the table you require, I am using it and at the moment everything is fine

https://hub.docker.com/r/arelis/gitdb

0

精彩评论

暂无评论...
验证码 换一张
取 消