Backing Up With rsync And Managing Previous Versions/History
When using backup software, most of them use the versatile tool rsync. With this tool it's very easy to sync files and directories on the local or a remote host, and thus creating a copy. But most of them do not manage the history of changed and deleted data. Deleted files are also deleted in the backupcopy, changes are simply overwritten. This howto describes how to keep track of these changed and deleted files.
A good rsync command is:
rsync --relative --recursive --update --delete --perms --owner --group --times --links --safe-links --super --one-file-system --devices %DirToBackup% %BackupTargetDir%
where %DirToBackup% is the directory to backup, for example a home directory, /home/joe.
And %BackupTargetDir% is the directory where this directory is copied to, for example
/srv/backupsimple/backup/localhost
Note that this command will create the directories home/joe in the target (because of the --relative) option.
Now this command is ok to make a copy, but a real backup is something else. To analyse the backup, there is an option for rsync which is very handy: --dry-run. This will perform the rsync, but rsync will not perform any realaction. In combination with the options --itemize-changes and --out-format this will give you a detailed logreport of the actions that will be taking (deleting, overwriting or creating).
For example, if there is no backup yet of the example directory of above, /home/joe in /srv/backupsimple/backup/localhost, and the contents of /home/joe looks like:
/home/joe/DocumentA
DocumentB
DocumentC
DocumentD
then the output of the command
rsync --dry-run --itemize-changes --out-format="%i|%n|" --relative --recursive --update --delete --perms --owner --group --times --links --safe-links --super --one-file-system --devices /home/joe /srv/backupsimple/backup/localhost
is:
.d..t......|/home/| cd+++++++++|/home/joe/| >f+++++++++|/home/joe/DocumentA| >f+++++++++|/home/joe/DocumentB| >f+++++++++|/home/joe/DocumentC| >f+++++++++|/home/joe/DocumentD|
Analyzing this:
- the directory /home is changed, the directory time, cause the home directory already exists
in the backupdirectory (I've done a backup of my own home earlier: /home/sbon) and the dir
joe is created later.
- the directory /home/joe is created: therefore the first c. Second the d: it's a dir.
- the following files are created: note the starting >f.
Doing the realbackup:
rsync --relative --recursive --update --delete --perms --owner --group --times --links --safe-links --super --one-file-system --devices /home/joe /srv/backupsimple/backup/localhost
So now there is copy (or snapshot as you like) in /srv/backupsimple/backup/localhost.
Now adding new file is not the point, but changing existing file and/or removing them. Starting with changing files. Changing one of them:
echo "new contents" >> /home/joe/DocumentA
The dry run rsync command gives:
rsync --dry-run --itemize-changes --out-format="%i|%n|" --relative --recursive --update --delete --perms --owner --group --times --links --safe-links --super --one-file-system --devices /home/joe /srv/backupsimple/backup/localhost
>f.st......|/home/joe/DocumentA|
Analyzing this:
- as you can see and probably expect that DocumentA is the only file that will be transfered.
Note the s and the t, the size is changed and the access/change time.
So, before doing a real backup, the file DocumentA should be backed up first.
To do so, create a timestamp:
timestamp=$(date "+%Y-%m-%d %H:%M:%S")
This looks like:
2010-04-18 20:55:08
Now create the "history" tree:
install --directory "/srv/backupsimple/history/localhost/$timestamp"
install --directory /srv/backupsimple/log/localhost/
Note the quotes, they are necessary cause of the space in the timestamp. Write the files to copy to a date based history tree:
echo "/home/joe/DocumentA" > /srv/backupsimple/log/localhost/$timestamp.changed
The rsync command:
rsync --relative --update --perms --owner --group --times --links --super --files-from="/srv/backupsimple/log/localhost/%timestamp.changed" /srv/backupsimple/backup/localhost /srv/backupsimple/history/localhost/$timestamp
This will make a backup of the DocumentA file, so now it's safe to run the original rsync command. The file which will be overwritten is copied to a safe place, where it's possible to be looked up later.
rsync --relative --recursive --update --delete --perms --owner --group --times --links --safe-links --super --one-file-system --devices /home/joe /srv/backupsimple/backup/localhost
So now we have a snapshot of /home/joe, updated at 18 april 2010, at 20:55:08, and a earlier version of /home/joe/DocumentA.
With deleted files this is similar:
rm /home/joe/DocumentD
rsync --dry-run --itemize-changes --out-format="%i|%n|" --relative --recursive --update --delete --perms --owner --group --times --links --safe-links --super --one-file-system --devices /home/joe /srv/backupsimple/backup/localhost
.d..t......|/home/joe/|
*deleting |home/joe/DocumentD|
Analyzing this output:
- the directory times of /home/joe are changed, which is always the case when a file is removed.
- and of course the file DocumentD is reported as deleted.
Create first a new timestamp:
timestamp=$(date "+%Y-%m-%d %H:%M:%S")
echo $timestamp
2010-04-18 20:56:30
Create the history dir:
install --directory "/srv/backupsimple/history/localhost/$timestamp"
echo "/home/joe/DocumentD" > /srv/backupsimple/log/localhost/$timestamp.deleted
The rsync command to backup the backup is:
rsync --relative --update --perms --owner --group --times --links --super --files-from="/srv/backupsimple/log/localhost/%timestamp.deleted" /srv/backupsimple/backup/localhost /srv/backupsimple/history/localhost/$timestamp
And again after this command the real rsync command:
rsync --relative --recursive --update --delete --perms --owner --group --times --links --safe-links --super --one-file-system --devices /home/joe /srv/backupsimple/backup/localhost
Generalized approach
When writing a script which does the things described above, things have to be generalized.
First set some variables:
DirToBackup=/home/joe
timestamp=$(date "+%Y-%m-%d %H:%M:%S")
install --directory "/srv/backupsimple/history/localhost/$timestamp"
install --directory /srv/backupsimple/backup/localhost/
install --directory /srv/backupsimple/log/localhost/
Do the dry run and write the output to a file:
rsync --dry-run --itemize-changes --out-format="%i|%n|" --relative --recursive --update --delete --perms --owner --group --times --links --safe-links --super --one-file-system --devices $DirToBackup /srv/backupsimple/backup/localhost | sed '/^ *$/d' > "/srv/backupsimple/log/localhost/$timestamp.dryrun"
Note: the sed command deletes empty lines.
Now when you look at the format of the dryrun file, the created, deleted and changed items are:
Created and changed files:
grep "^.f" "/srv/backupsimple/log/localhost/$timestamp.dryrun" >>
"/srv/backupsimple/log/localhost/$timestamp.onlyfiles"
grep "^.f+++++++++" "/srv/backupsimple/log/localhost/$timestamp.onlyfiles" |
awk -F '|' '{print $2 }' | sed '[email protected]^/@@' >>
"/srv/backupsimple/log/localhost/$timestamp.created"
grep --invert-match "^.f+++++++++" "/srv/backupsimple/log/localhost/$timestamp.onlyfiles" | awk -F '|' '{print $2 }' | sed '[email protected]^/@@" >> "/srv/backupsimple/log/localhost/$timestamp.changed"
Some notes:
- the various sed commands are necessary to remove the starting slash to make them relative and not absolute
- the dot in the grep command (^.f) is here a regexp expression and should not be taken literally
Created and changed directories:
grep "^\.d" "/srv/backupsimple/log/localhost/$timestamp.dryrun" |
awk -F '|' '{print $2 }' | sed -e '[email protected]^/@@' -e '[email protected]/[email protected]@' >>
"/srv/backupsimple/log/localhost/$timestamp.changed"
grep "^cd" "/srv/backupsimple/log/localhost/$timestamp.dryrun" | awk -F '|' '{print $2 }' | sed -e '[email protected]^/@@' -e '[email protected]/[email protected]@' >> "/srv/backupsimple/log/localhost/$timestamp.created"
Some notes:
- the various sed commands are necessary to remove the starting slash and the slash at the end of the path,
again to make them relative and prevent "recursive" behaviour, rsync is sensitive to that
- the dot in the grep command (^\.d) should be taken literally
Deleted files and directories:
grep "^*deleting" "/srv/backupsimple/log/localhost/$timestamp.dryrun" | awk -F '|' '{print $2 }' >> "/srv/backupsimple/log/localhost/$timestamp.deleted"
Notes:
- the paths do not start with a slash, so removing them is not necessary
- a trailing slash is harmless here: deleting a dir means always recursive
So now there are the files $timestamp.created, $timestamp.changed and $timestamp.deleted.
The file with created items is only here for logging. You cannot and do not have to backup files which are not created yet!
Cat the changed and the deleted items together:
cat "/srv/backupsimple/log/localhost/$timestamp.deleted" > /tmp/tmp.rsync.list
cat "/srv/backupsimple/log/localhost/$timestamp.changed" >> /tmp/tmp.rsync.list
sort --output=/tmp/rsync.list --unique /tmp/tmp.rsync.list
Now do the backup of the backup:
rsync --relative --update --perms --owner --group --times --links --super --files-from=/tmp/rsync.list /srv/backupsimple/backup/localhost/ "/srv/backupsimple/history/localhost/$timestamp"
Finally do the real backup:
rsync --relative --recursive --update --delete --perms --owner --group --times --links --safe-links --super --one-file-system --devices $DirToBackup /srv/backupsimple/backup/localhost
One note, I've copied these commands from a script. There might be some errors, but the idea is clear I hope.
Local and remote
Above is described howto do a backup locally, but it's also very possible to backup to a remote host running a rsync deamon. It requires a more complicated configuration. Not doing the dryrun and the realbackup, they are simple, but it's the step of backing up the backup. The various files with created, changed and deleted items are on the localhost, while this step should be performed on the remote host.
There are various ways to solve this. One of them is mounting the remote host with sshfs, and the localhost can do the backup as if it's acting local.
A better (imho) sollution is creating an apart "queue" share on the rsync server (besides the backup and the history shares) where the file with the items to be backed up from the backup should be synced to. The rsync server has te ability to run pre and post scripts. When the localhost tries to do the realbackup, a pre script should check there is list there in the queue which should be processed first. If so, it will do this step first. The rsync command on the localhost just will wait till the pre fase is finished.
Suggested articles
13 Comment(s)
Comments
I have directory called /home/dinkar which i have to backup everyday. I backup this directory in /mnt/dinkar. When I run rsync, it removes or overwrite deleted or modified files. I created another directory called /mnt/dinkar_old. In this directory i store all removed or updated files. I use following command to achieve this
rsync -vax --perms --progress --numeric-ids --delete --delete-excluded --exclude '*~' --backup --backup-dir=/mnt/dinkar_old/ --suffix=.`date +%y%m%d_%H%M` /home/dinkar/ /mnt/dinkar
This command will move deleted or updated files from /mnt/dinkar and put them in /mnt/dinkar_old and add suffix to file name as date and time.
I use ubuntu. Not sure if this will work in other distros.
Hope this information helps somebody.
Dinkar
Your script is working fine for me. The only problem I am facing is the time stamp is added after the file extension (i.e. test.docx.16_01_16_1923) I am unable to open the file without chaging the extension manually. Can you help me out?
My Code is: rsync -vax -rsa=ssh --perms --progress --numeric-ids --exclude '*~' --backup --backup-dir='/mnt/Mirror/Rsync_Deleted_files/' --suffix=.`date +%y_%m_%d_%H%M` /media/bhavamayananda/GENSEC\ Backup/ [email protected]:'/mnt/Mirror/GENSEC Backup'
First, rsync rocks! It is fantastic when you want to make mirror copies of directory systems. Combined with file system hard linking methods, it is possible to create differential backups.
Or you could just use rdiff-backup. http://rdiff-backup.nongnu.org/ The main options are very similar to rsync. It handles differential backups just fine and stores the latest backup as a mirror and all the prior backups in gzip'd differential files. It also supports remote backups over ssh. There is a win32 version, but I've never been too impressed. I think it works best with samba mounts due to the extra dependences for ssh on Windows.
For example, to backup a $HOME directory to another box,
EXCLUDE=“—exclude-symbolic-links —exclude /.gnupg
TARGET=“romulus::media/Lap-Backup/xubuntu/${LOGNAME}”
/usr/bin/rdiff-backup $EXCLUDE ${HOME} “$TARGET”
/usr/bin/rdiff-backup —remove-older-than 90D —force “$TARGET”I prefer to keep only 90 days worth of backups, so any older than 90 days is removed. Perhaps I'm missing something, but this seems much easier to me. In the real script, I have about 20 more "excludes." File permissions are retained in metadata which is also differentially maintained.
The only caveat I have is with very large files (like virtual machine .IMG files). rdiff-backup isn't very good at handling those, IMHO. But, I do use it to backup about 10 VMs nightly. Each entire VM takes about 2 minutes to backup all files differentially. Using rsync, it was taking over 45 minutes, but honestly it wasn't with the differentials.
To restore
/usr/bin/rdiff-backup -r now “$TARGET” ${HOME}After we do rdiff-backups, we rsync the resulting backup areas off-site as part of our DR plan. We've had to test the restores a few times - ooops. Everything worked perfectly, with no surprises.
I know rdiff-backup. Also very good, but I missed somehow the
ability to configure the way it's maintaining the history. That's why I've decided to write this very simple tool.
I can make this tool do everything, backup previous version in the way I want it.
Stef
rsnapshot could be worth looking at … http://rsnapshot.org/
It is unclear to me if you are getting more from your system than rsnapshot provides, but it does versioned snaphot backups.
Managing a specific number of backups is easy in rdiff-backup:
/usr/bin/rdiff-backup --remove-older-than 90D --force "$TARGET"
Just to be clear, rdiff and rdiff-backup ARE different tools.
as i have use, all rdif option is rsync option, is it right ?
Good info,
Im doing something simliar to backup my linux and windows machines, however im probably doing it the worst possible way. Im in the process of reviewing it atm.
But basicly I rsync to a "full" backup folder where changes are over written then I create hardlinks of those files to another "snapshot" location. using the bellow command
cp -al source destination
It works well but I have to manage my own backup intervals etc.. and the disk IO doing the hardlink snapshot is quite high, esp for the 15 servers im using to backup with it. But it doesn mean I get nice incramental backups.
Just gave rdiff a go but it seems it doesnt talk to the rsync servers already setup on my windows machines so I may have to re-evaluate it.
Another option to rdiff is RIBS. It uses php and rsync to to create incremental backups at the intervals the user configures.
http://www.rustyparts.com/ribs.php
You have some cool ideas here. But I wonder why you elected not to use hard links with the --link-dest option?
e.g., http://www.backupcentral.com/components/com_mambowiki/index.php/Rsync_snapshots
That method seems to better leverage functionality already built into rsync for incremental backups. Just asking ... I like your ideas too.
Thanks for the awk and sed extraction commandlines for itemized info.
I use rsync to Solaris zfs filesystem. Then do a zfs snapshot. All rsync has to do is get the changed files over. The rsyncd.conf option "post-xfer exec" is perfect for taking the snapshot when rsync is all done movin the changes.
PEACE
Have you tried rsnapshot - pretty much does this with a simple configuration file.
I believe you have a small typo -- a double quote instead of a single quote:
grep --invert-match "^.f+++++++++" "/srv/backupsimple/log/localhost/$timestamp.onlyfiles" |
awk -F '|' '{print $2 }' | sed '[email protected]^/@@" >>
"/srv/backupsimple/log/localhost/$timestamp.changed"
There is a double-quote at the end of the sed command, but I believe it should be a single quote, to match the single after the word sed
English |
Deutsch