Backing Up With rsync And Managing Previous Versions/History

Want to support HowtoForge? Become a subscriber!
 
Submitted by stefbon (Contact Author) (Forums) on Mon, 2010-04-19 20:21. :: Backup

Backing Up With rsync And Managing Previous Versions/History

When using backup software, most of them use the versatile tool rsync. With this tool it's very easy to sync files and directories on the local or a remote host, and thus creating a copy. But most of them do not manage the history of changed and deleted data. Deleted files are also deleted in the backupcopy, changes are simply overwritten. This howto describes how to keep track of these changed and deleted files.

A good rsync command is:

rsync --relative --recursive --update --delete --perms --owner --group --times --links --safe-links --super --one-file-system --devices %DirToBackup% %BackupTargetDir%

where %DirToBackup% is the directory to backup, for example a home directory, /home/joe.
And %BackupTargetDir% is the directory where this directory is copied to, for example /srv/backupsimple/backup/localhost

Note that this command will create the directories home/joe in the target (because of the --relative) option.

Now this command is ok to make a copy, but a real backup is something else. To analyse the backup, there is an option for rsync which is very handy: --dry-run. This will perform the rsync, but rsync will not perform any realaction. In combination with the options --itemize-changes and --out-format this will give you a detailed logreport of the actions that will be taking (deleting, overwriting or creating).

For example, if there is no backup yet of the example directory of above, /home/joe in /srv/backupsimple/backup/localhost, and the contents of /home/joe looks like:

/home/joe/DocumentA
          DocumentB
          DocumentC
          DocumentD

then the output of the command

rsync --dry-run --itemize-changes --out-format="%i|%n|" --relative --recursive --update --delete --perms --owner --group --times --links --safe-links --super --one-file-system --devices /home/joe /srv/backupsimple/backup/localhost

is:

.d..t......|/home/|
cd+++++++++|/home/joe/|
>f+++++++++|/home/joe/DocumentA|
>f+++++++++|/home/joe/DocumentB|
>f+++++++++|/home/joe/DocumentC|
>f+++++++++|/home/joe/DocumentD|

Analyzing this:
- the directory /home is changed, the directory time, cause the home directory already exists in the backupdirectory (I've done a backup of my own home earlier: /home/sbon) and the dir joe is created later.
- the directory /home/joe is created: therefore the first c. Second the d: it's a dir.
- the following files are created: note the starting >f.

Doing the realbackup:

rsync --relative --recursive --update --delete --perms --owner --group --times --links --safe-links --super --one-file-system --devices /home/joe /srv/backupsimple/backup/localhost

So now there is copy (or snapshot as you like) in /srv/backupsimple/backup/localhost.

Now adding new file is not the point, but changing existing file and/or removing them. Starting with changing files. Changing one of them:

echo "new contents" >> /home/joe/DocumentA

The dry run rsync command gives:

rsync --dry-run --itemize-changes --out-format="%i|%n|" --relative --recursive --update --delete --perms --owner --group --times --links --safe-links --super --one-file-system --devices /home/joe /srv/backupsimple/backup/localhost

>f.st......|/home/joe/DocumentA|

Analyzing this:
- as you can see and probably expect that DocumentA is the only file that will be transfered.
Note the s and the t, the size is changed and the access/change time.

So, before doing a real backup, the file DocumentA should be backed up first.
To do so, create a timestamp:

timestamp=$(date "+%Y-%m-%d %H:%M:%S")

This looks like:

2010-04-18 20:55:08

Now create the "history" tree:

install --directory "/srv/backupsimple/history/localhost/$timestamp"
install --directory /srv/backupsimple/log/localhost/

Note the quotes, they are necessary cause of the space in the timestamp. Write the files to copy to a date based history tree:

echo "/home/joe/DocumentA" > /srv/backupsimple/log/localhost/$timestamp.changed

The rsync command:

rsync --relative --update --perms --owner --group --times --links --super --files-from="/srv/backupsimple/log/localhost/%timestamp.changed" /srv/backupsimple/backup/localhost /srv/backupsimple/history/localhost/$timestamp

This will make a backup of the DocumentA file, so now it's safe to run the original rsync command. The file which will be overwritten is copied to a safe place, where it's possible to be looked up later.

rsync --relative --recursive --update --delete --perms --owner --group --times --links --safe-links --super --one-file-system --devices /home/joe /srv/backupsimple/backup/localhost

So now we have a snapshot of /home/joe, updated at 18 april 2010, at 20:55:08, and a earlier version of /home/joe/DocumentA.

With deleted files this is similar:

rm /home/joe/DocumentD

rsync --dry-run --itemize-changes --out-format="%i|%n|" --relative --recursive --update --delete --perms --owner --group --times --links --safe-links --super --one-file-system --devices /home/joe /srv/backupsimple/backup/localhost

.d..t......|/home/joe/|
*deleting |home/joe/DocumentD|

Analyzing this output:
- the directory times of /home/joe are changed, which is always the case when a file is removed.
- and of course the file DocumentD is reported as deleted.

Create first a new timestamp:

timestamp=$(date "+%Y-%m-%d %H:%M:%S")
echo $timestamp

2010-04-18 20:56:30

Create the history dir:

install --directory "/srv/backupsimple/history/localhost/$timestamp"

echo "/home/joe/DocumentD" > /srv/backupsimple/log/localhost/$timestamp.deleted

The rsync command to backup the backup is:

rsync --relative --update --perms --owner --group --times --links --super --files-from="/srv/backupsimple/log/localhost/%timestamp.deleted" /srv/backupsimple/backup/localhost /srv/backupsimple/history/localhost/$timestamp

And again after this command the real rsync command:

rsync --relative --recursive --update --delete --perms --owner --group --times --links --safe-links --super --one-file-system --devices /home/joe /srv/backupsimple/backup/localhost

 

Generalized approach

When writing a script which does the things described above, things have to be generalized.

First set some variables:

DirToBackup=/home/joe
timestamp=$(date "+%Y-%m-%d %H:%M:%S")

install --directory "/srv/backupsimple/history/localhost/$timestamp"
install --directory /srv/backupsimple/backup/localhost/
install --directory /srv/backupsimple/log/localhost/

Do the dry run and write the output to a file:

rsync --dry-run --itemize-changes --out-format="%i|%n|" --relative --recursive --update --delete --perms --owner --group --times --links --safe-links --super --one-file-system --devices $DirToBackup /srv/backupsimple/backup/localhost | sed '/^ *$/d' > "/srv/backupsimple/log/localhost/$timestamp.dryrun"

Note: the sed command deletes empty lines.

Now when you look at the format of the dryrun file, the created, deleted and changed items are:

Created and changed files:

grep "^.f" "/srv/backupsimple/log/localhost/$timestamp.dryrun" >> "/srv/backupsimple/log/localhost/$timestamp.onlyfiles"

grep "^.f+++++++++" "/srv/backupsimple/log/localhost/$timestamp.onlyfiles" | awk -F '|' '{print $2 }' | sed 's@^/@@' >> "/srv/backupsimple/log/localhost/$timestamp.created"

grep --invert-match "^.f+++++++++" "/srv/backupsimple/log/localhost/$timestamp.onlyfiles" | awk -F '|' '{print $2 }' | sed 's@^/@@" >> "/srv/backupsimple/log/localhost/$timestamp.changed"

Some notes:
- the various sed commands are necessary to remove the starting slash to make them relative and not absolute
- the dot in the grep command (^.f) is here a regexp expression and should not be taken literally

Created and changed directories:

grep "^\.d" "/srv/backupsimple/log/localhost/$timestamp.dryrun" | awk -F '|' '{print $2 }' | sed -e 's@^/@@' -e 's@/$@@' >> "/srv/backupsimple/log/localhost/$timestamp.changed"

grep "^cd" "/srv/backupsimple/log/localhost/$timestamp.dryrun" | awk -F '|' '{print $2 }' | sed -e 's@^/@@' -e 's@/$@@' >> "/srv/backupsimple/log/localhost/$timestamp.created"

Some notes:
- the various sed commands are necessary to remove the starting slash and the slash at the end of the path, again to make them relative and prevent "recursive" behaviour, rsync is sensitive to that
- the dot in the grep command (^\.d) should be taken literally

Deleted files and directories:

grep "^*deleting" "/srv/backupsimple/log/localhost/$timestamp.dryrun" | awk -F '|' '{print $2 }' >> "/srv/backupsimple/log/localhost/$timestamp.deleted"

Notes:
- the paths do not start with a slash, so removing them is not necessary
- a trailing slash is harmless here: deleting a dir means always recursive

So now there are the files $timestamp.created, $timestamp.changed and $timestamp.deleted.

The file with created items is only here for logging. You cannot and do not have to backup files which are not created yet!

Cat the changed and the deleted items together:

cat "/srv/backupsimple/log/localhost/$timestamp.deleted" > /tmp/tmp.rsync.list
cat "/srv/backupsimple/log/localhost/$timestamp.changed" >> /tmp/tmp.rsync.list
sort --output=/tmp/rsync.list --unique /tmp/tmp.rsync.list

Now do the backup of the backup:

rsync --relative --update --perms --owner --group --times --links --super --files-from=/tmp/rsync.list /srv/backupsimple/backup/localhost/ "/srv/backupsimple/history/localhost/$timestamp"

Finally do the real backup:

rsync --relative --recursive --update --delete --perms --owner --group --times --links --safe-links --super --one-file-system --devices $DirToBackup /srv/backupsimple/backup/localhost

One note, I've copied these commands from a script. There might be some errors, but the idea is clear I hope.

Local and remote

Above is described howto do a backup locally, but it's also very possible to backup to a remote host running a rsync deamon. It requires a more complicated configuration. Not doing the dryrun and the realbackup, they are simple, but it's the step of backing up the backup. The various files with created, changed and deleted items are on the localhost, while this step should be performed on the remote host.

There are various ways to solve this. One of them is mounting the remote host with sshfs, and the localhost can do the backup as if it's acting local.

A better (imho) sollution is creating an apart "queue" share on the rsync server (besides the backup and the history shares) where the file with the items to be backed up from the backup should be synced to. The rsync server has te ability to run pre and post scripts. When the localhost tries to do the realbackup, a pre script should check there is list there in the queue which should be processed first. If so, it will do this step first. The rsync command on the localhost just will wait till the pre fase is finished.

 


Please do not use the comment function to ask for help! If you need help, please use our forum.
Comments will be published after administrator approval.
Submitted by ian fleming (not registered) on Tue, 2014-03-18 22:44.

I believe you have a small typo -- a double quote instead of a single quote:

 grep --invert-match "^.f+++++++++" "/srv/backupsimple/log/localhost/$timestamp.onlyfiles" | 
     awk -F '|' '{print $2 }' | sed 's@^/@@" >>
     "/srv/backupsimple/log/localhost/$timestamp.changed"

There is a double-quote at the end of the sed command, but I believe it should be a single quote, to match the single after the word sed

 

Submitted by Anonymous (not registered) on Fri, 2011-01-28 09:17.
Have you tried rsnapshot - pretty much does this with a simple configuration file.
Submitted by Anonymous (not registered) on Tue, 2010-04-27 17:55.

Thanks for the awk and sed extraction commandlines for itemized info.

I use rsync to Solaris zfs filesystem.  Then do a zfs snapshot.  All rsync has to do is get the changed files over.  The rsyncd.conf option "post-xfer exec" is perfect for taking the snapshot when rsync is all done movin the changes.

 PEACE

Submitted by Anonymous (not registered) on Sat, 2010-04-24 03:35.

You have some cool ideas here. But I wonder why you elected not to use hard links with the --link-dest option?
e.g., http://www.backupcentral.com/components/com_mambowiki/index.php/Rsync_snapshots
That method seems to better leverage functionality already built into rsync for incremental backups. Just asking ... I like your ideas too.

Submitted by Rhett (not registered) on Thu, 2010-04-22 22:11.

Another option to rdiff is RIBS.  It uses php and rsync to to create incremental backups at the intervals the user configures.

 http://www.rustyparts.com/ribs.php

Submitted by falconz (not registered) on Tue, 2010-04-20 22:37.

Good info,

 Im doing something simliar to backup my linux and windows machines, however im probably doing it the worst possible way. Im in the process of reviewing it atm.

 

But basicly I rsync to a "full" backup folder where changes are over written then I create hardlinks of those files to another "snapshot" location. using the bellow command

cp -al source destination

 

It works well but I have to manage my own backup intervals etc.. and the disk IO doing the hardlink snapshot is quite high, esp for the 15 servers im using to backup with it. But it doesn mean I get nice incramental backups.

 

Just gave rdiff a go but it seems it doesnt talk to the rsync servers already setup on my windows machines so I may have to re-evaluate it.

Submitted by Dinkar (not registered) on Tue, 2010-04-20 01:47.

I have directory called /home/dinkar which i have to backup everyday. I backup this directory in /mnt/dinkar. When I run rsync, it removes or overwrite  deleted or modified files. I created another directory called /mnt/dinkar_old. In this directory i store all removed or updated files. I use following command to achieve this

 rsync -vax --perms --progress --numeric-ids --delete --delete-excluded --exclude '*~' --backup  --backup-dir=/mnt/dinkar_old/ --suffix=.`date +%y%m%d_%H%M` /home/dinkar/ /mnt/dinkar

 This command will move deleted or updated files from /mnt/dinkar and put them in /mnt/dinkar_old and add suffix to file name as date and time.

 I use ubuntu. Not sure if this will work in other distros.

 Hope this information helps somebody.

 Dinkar

Submitted by TheFu (not registered) on Mon, 2010-04-19 23:14.
First, rsync rocks! It is fantastic when you want to make mirror copies of directory systems. Combined with file system hard linking methods, it is possible to create differential backups.

Or you could just use rdiff-backup. http://rdiff-backup.nongnu.org/ The main options are very similar to rsync. It handles differential backups just fine and stores the latest backup as a mirror and all the prior backups in gzip'd differential files. It also supports remote backups over ssh. There is a win32 version, but I've never been too impressed. I think it works best with samba mounts due to the extra dependences for ssh on Windows.

For example, to backup a $HOME directory to another box,
EXCLUDE=“—exclude-symbolic-links —exclude /.gnupg
TARGET=“romulus::media/Lap-Backup/xubuntu/${LOGNAME}”
/usr/bin/rdiff-backup $EXCLUDE ${HOME}  “$TARGET”
/usr/bin/rdiff-backup —remove-older-than 90D —force “$TARGET”


I prefer to keep only 90 days worth of backups, so any older than 90 days is removed. Perhaps I'm missing something, but this seems much easier to me. In the real script, I have about 20 more "excludes." File permissions are retained in metadata which is also differentially maintained.

The only caveat I have is with very large files (like virtual machine .IMG files). rdiff-backup isn't very good at handling those, IMHO. But, I do use it to backup about 10 VMs nightly. Each entire VM takes about 2 minutes to backup all files differentially. Using rsync, it was taking over 45 minutes, but honestly it wasn't with the differentials.

To restore
/usr/bin/rdiff-backup -r now “$TARGET”   ${HOME}

After we do rdiff-backups, we rsync the resulting backup areas off-site as part of our DR plan. We've had to test the restores a few times - ooops. Everything worked perfectly, with no surprises.
Submitted by stefbon (registered user) on Tue, 2010-04-20 10:53.

I know rdiff-backup. Also very good, but I missed somehow the

ability to configure the way it's maintaining the history. That's why I've decided to write this very simple tool.

I can make this tool do everything, backup previous version in the way  I want it.

Stef

 

 

Submitted by TheFu (not registered) on Thu, 2010-05-06 14:00.

Managing a specific number of backups is easy in rdiff-backup:

/usr/bin/rdiff-backup --remove-older-than 90D --force "$TARGET"

 Just to be clear, rdiff and rdiff-backup ARE different tools.

Submitted by Anonymous (not registered) on Wed, 2010-07-21 13:56.
as i have use, all rdif option is rsync option, is it right ?
Submitted by Drew (not registered) on Fri, 2010-04-23 18:55.

rsnapshot could be worth looking at … http://rsnapshot.org/

It is unclear to me if you are getting more from your system than rsnapshot provides, but it does versioned snaphot backups.