Backing Up With rsync And Managing Previous Versions/History

When using backup software, most of them use the versatile tool rsync. With this tool it's very easy to sync files and directories on the local or a remote host, and thus creating a copy. But most of them do not manage the history of changed and deleted data. Deleted files are also deleted in the backupcopy, changes are simply overwritten. This howto describes how to keep track of these changed and deleted files.

A good rsync command is:

rsync --relative --recursive --update --delete --perms --owner --group --times --links --safe-links --super --one-file-system --devices %DirToBackup% %BackupTargetDir%

where %DirToBackup% is the directory to backup, for example a home directory, /home/joe.
And %BackupTargetDir% is the directory where this directory is copied to, for example /srv/backupsimple/backup/localhost

Note that this command will create the directories home/joe in the target (because of the --relative) option.

Now this command is ok to make a copy, but a real backup is something else. To analyse the backup, there is an option for rsync which is very handy: --dry-run. This will perform the rsync, but rsync will not perform any realaction. In combination with the options --itemize-changes and --out-format this will give you a detailed logreport of the actions that will be taking (deleting, overwriting or creating).

For example, if there is no backup yet of the example directory of above, /home/joe in /srv/backupsimple/backup/localhost, and the contents of /home/joe looks like:

/home/joe/DocumentA
          DocumentB
          DocumentC
          DocumentD

then the output of the command

rsync --dry-run --itemize-changes --out-format="%i|%n|" --relative --recursive --update --delete --perms --owner --group --times --links --safe-links --super --one-file-system --devices /home/joe /srv/backupsimple/backup/localhost

is:

.d..t......|/home/|
cd+++++++++|/home/joe/|
>f+++++++++|/home/joe/DocumentA|
>f+++++++++|/home/joe/DocumentB|
>f+++++++++|/home/joe/DocumentC|
>f+++++++++|/home/joe/DocumentD|

Analyzing this:
- the directory /home is changed, the directory time, cause the home directory already exists in the backupdirectory (I've done a backup of my own home earlier: /home/sbon) and the dir joe is created later.
- the directory /home/joe is created: therefore the first c. Second the d: it's a dir.
- the following files are created: note the starting >f.

Doing the realbackup:

rsync --relative --recursive --update --delete --perms --owner --group --times --links --safe-links --super --one-file-system --devices /home/joe /srv/backupsimple/backup/localhost

So now there is copy (or snapshot as you like) in /srv/backupsimple/backup/localhost.

Now adding new file is not the point, but changing existing file and/or removing them. Starting with changing files. Changing one of them:

echo "new contents" >> /home/joe/DocumentA

The dry run rsync command gives:

rsync --dry-run --itemize-changes --out-format="%i|%n|" --relative --recursive --update --delete --perms --owner --group --times --links --safe-links --super --one-file-system --devices /home/joe /srv/backupsimple/backup/localhost

>f.st......|/home/joe/DocumentA|

Analyzing this:
- as you can see and probably expect that DocumentA is the only file that will be transfered.
Note the s and the t, the size is changed and the access/change time.

So, before doing a real backup, the file DocumentA should be backed up first.
To do so, create a timestamp:

timestamp=$(date "+%Y-%m-%d %H:%M:%S")

This looks like:

2010-04-18 20:55:08

Now create the "history" tree:

install --directory "/srv/backupsimple/history/localhost/$timestamp"
install --directory /srv/backupsimple/log/localhost/

Note the quotes, they are necessary cause of the space in the timestamp. Write the files to copy to a date based history tree:

echo "/home/joe/DocumentA" > /srv/backupsimple/log/localhost/$timestamp.changed

The rsync command:

rsync --relative --update --perms --owner --group --times --links --super --files-from="/srv/backupsimple/log/localhost/%timestamp.changed" /srv/backupsimple/backup/localhost /srv/backupsimple/history/localhost/$timestamp

This will make a backup of the DocumentA file, so now it's safe to run the original rsync command. The file which will be overwritten is copied to a safe place, where it's possible to be looked up later.

rsync --relative --recursive --update --delete --perms --owner --group --times --links --safe-links --super --one-file-system --devices /home/joe /srv/backupsimple/backup/localhost

So now we have a snapshot of /home/joe, updated at 18 april 2010, at 20:55:08, and a earlier version of /home/joe/DocumentA.

With deleted files this is similar:

rm /home/joe/DocumentD

rsync --dry-run --itemize-changes --out-format="%i|%n|" --relative --recursive --update --delete --perms --owner --group --times --links --safe-links --super --one-file-system --devices /home/joe /srv/backupsimple/backup/localhost

.d..t......|/home/joe/|
*deleting |home/joe/DocumentD|

Analyzing this output:
- the directory times of /home/joe are changed, which is always the case when a file is removed.
- and of course the file DocumentD is reported as deleted.

Create first a new timestamp:

timestamp=$(date "+%Y-%m-%d %H:%M:%S")
echo $timestamp

2010-04-18 20:56:30

Create the history dir:

install --directory "/srv/backupsimple/history/localhost/$timestamp"

echo "/home/joe/DocumentD" > /srv/backupsimple/log/localhost/$timestamp.deleted

The rsync command to backup the backup is:

rsync --relative --update --perms --owner --group --times --links --super --files-from="/srv/backupsimple/log/localhost/%timestamp.deleted" /srv/backupsimple/backup/localhost /srv/backupsimple/history/localhost/$timestamp

And again after this command the real rsync command:

rsync --relative --recursive --update --delete --perms --owner --group --times --links --safe-links --super --one-file-system --devices /home/joe /srv/backupsimple/backup/localhost

 

Generalized approach

When writing a script which does the things described above, things have to be generalized.

First set some variables:

DirToBackup=/home/joe
timestamp=$(date "+%Y-%m-%d %H:%M:%S")

install --directory "/srv/backupsimple/history/localhost/$timestamp"
install --directory /srv/backupsimple/backup/localhost/
install --directory /srv/backupsimple/log/localhost/

Do the dry run and write the output to a file:

rsync --dry-run --itemize-changes --out-format="%i|%n|" --relative --recursive --update --delete --perms --owner --group --times --links --safe-links --super --one-file-system --devices $DirToBackup /srv/backupsimple/backup/localhost | sed '/^ *$/d' > "/srv/backupsimple/log/localhost/$timestamp.dryrun"

Note: the sed command deletes empty lines.

Now when you look at the format of the dryrun file, the created, deleted and changed items are:

Created and changed files:

grep "^.f" "/srv/backupsimple/log/localhost/$timestamp.dryrun" >> "/srv/backupsimple/log/localhost/$timestamp.onlyfiles"

grep "^.f+++++++++" "/srv/backupsimple/log/localhost/$timestamp.onlyfiles" | awk -F '|' '{print $2 }' | sed '[email protected]^/@@' >> "/srv/backupsimple/log/localhost/$timestamp.created"

grep --invert-match "^.f+++++++++" "/srv/backupsimple/log/localhost/$timestamp.onlyfiles" | awk -F '|' '{print $2 }' | sed '[email protected]^/@@" >> "/srv/backupsimple/log/localhost/$timestamp.changed"

Some notes:
- the various sed commands are necessary to remove the starting slash to make them relative and not absolute
- the dot in the grep command (^.f) is here a regexp expression and should not be taken literally

Created and changed directories:

grep "^\.d" "/srv/backupsimple/log/localhost/$timestamp.dryrun" | awk -F '|' '{print $2 }' | sed -e '[email protected]^/@@' -e '[email protected]/[email protected]@' >> "/srv/backupsimple/log/localhost/$timestamp.changed"

grep "^cd" "/srv/backupsimple/log/localhost/$timestamp.dryrun" | awk -F '|' '{print $2 }' | sed -e '[email protected]^/@@' -e '[email protected]/[email protected]@' >> "/srv/backupsimple/log/localhost/$timestamp.created"

Some notes:
- the various sed commands are necessary to remove the starting slash and the slash at the end of the path, again to make them relative and prevent "recursive" behaviour, rsync is sensitive to that
- the dot in the grep command (^\.d) should be taken literally

Deleted files and directories:

grep "^*deleting" "/srv/backupsimple/log/localhost/$timestamp.dryrun" | awk -F '|' '{print $2 }' >> "/srv/backupsimple/log/localhost/$timestamp.deleted"

Notes:
- the paths do not start with a slash, so removing them is not necessary
- a trailing slash is harmless here: deleting a dir means always recursive

So now there are the files $timestamp.created, $timestamp.changed and $timestamp.deleted.

The file with created items is only here for logging. You cannot and do not have to backup files which are not created yet!

Cat the changed and the deleted items together:

cat "/srv/backupsimple/log/localhost/$timestamp.deleted" > /tmp/tmp.rsync.list
cat "/srv/backupsimple/log/localhost/$timestamp.changed" >> /tmp/tmp.rsync.list
sort --output=/tmp/rsync.list --unique /tmp/tmp.rsync.list

Now do the backup of the backup:

rsync --relative --update --perms --owner --group --times --links --super --files-from=/tmp/rsync.list /srv/backupsimple/backup/localhost/ "/srv/backupsimple/history/localhost/$timestamp"

Finally do the real backup:

rsync --relative --recursive --update --delete --perms --owner --group --times --links --safe-links --super --one-file-system --devices $DirToBackup /srv/backupsimple/backup/localhost

One note, I've copied these commands from a script. There might be some errors, but the idea is clear I hope.

Local and remote

Above is described howto do a backup locally, but it's also very possible to backup to a remote host running a rsync deamon. It requires a more complicated configuration. Not doing the dryrun and the realbackup, they are simple, but it's the step of backing up the backup. The various files with created, changed and deleted items are on the localhost, while this step should be performed on the remote host.

There are various ways to solve this. One of them is mounting the remote host with sshfs, and the localhost can do the backup as if it's acting local.

A better (imho) sollution is creating an apart "queue" share on the rsync server (besides the backup and the history shares) where the file with the items to be backed up from the backup should be synced to. The rsync server has te ability to run pre and post scripts. When the localhost tries to do the realbackup, a pre script should check there is list there in the queue which should be processed first. If so, it will do this step first. The rsync command on the localhost just will wait till the pre fase is finished.

 

Share this page:

14 Comment(s)

Add comment

Please register in our forum first to comment.

Comments

By: Dinkar

I have directory called /home/dinkar which i have to backup everyday. I backup this directory in /mnt/dinkar. When I run rsync, it removes or overwrite  deleted or modified files. I created another directory called /mnt/dinkar_old. In this directory i store all removed or updated files. I use following command to achieve this

 rsync -vax --perms --progress --numeric-ids --delete --delete-excluded --exclude '*~' --backup  --backup-dir=/mnt/dinkar_old/ --suffix=.`date +%y%m%d_%H%M` /home/dinkar/ /mnt/dinkar

 This command will move deleted or updated files from /mnt/dinkar and put them in /mnt/dinkar_old and add suffix to file name as date and time.

 I use ubuntu. Not sure if this will work in other distros.

 Hope this information helps somebody.

 Dinkar

By: vedviveka

Your script is working fine for me. The only problem I am facing is the time stamp is added after the file extension (i.e. test.docx.16_01_16_1923) I am unable to open the file without chaging the extension manually. Can you help me out?

My Code is: rsync -vax -rsa=ssh --perms --progress --numeric-ids --exclude '*~' --backup  --backup-dir='/mnt/Mirror/Rsync_Deleted_files/' --suffix=.`date +%y_%m_%d_%H%M` /media/bhavamayananda/GENSEC\ Backup/ [email protected]:'/mnt/Mirror/GENSEC Backup'

 

By: TheFu

First, rsync rocks! It is fantastic when you want to make mirror copies of directory systems. Combined with file system hard linking methods, it is possible to create differential backups.

Or you could just use rdiff-backup. http://rdiff-backup.nongnu.org/ The main options are very similar to rsync. It handles differential backups just fine and stores the latest backup as a mirror and all the prior backups in gzip'd differential files. It also supports remote backups over ssh. There is a win32 version, but I've never been too impressed. I think it works best with samba mounts due to the extra dependences for ssh on Windows.

For example, to backup a $HOME directory to another box,

EXCLUDE=“—exclude-symbolic-links —exclude /.gnupg
TARGET=“romulus::media/Lap-Backup/xubuntu/${LOGNAME}”
/usr/bin/rdiff-backup $EXCLUDE ${HOME}  “$TARGET”
/usr/bin/rdiff-backup —remove-older-than 90D —force “$TARGET”


I prefer to keep only 90 days worth of backups, so any older than 90 days is removed. Perhaps I'm missing something, but this seems much easier to me. In the real script, I have about 20 more "excludes." File permissions are retained in metadata which is also differentially maintained.

The only caveat I have is with very large files (like virtual machine .IMG files). rdiff-backup isn't very good at handling those, IMHO. But, I do use it to backup about 10 VMs nightly. Each entire VM takes about 2 minutes to backup all files differentially. Using rsync, it was taking over 45 minutes, but honestly it wasn't with the differentials.

To restore
/usr/bin/rdiff-backup -r now “$TARGET”   ${HOME}

After we do rdiff-backups, we rsync the resulting backup areas off-site as part of our DR plan. We've had to test the restores a few times - ooops. Everything worked perfectly, with no surprises.

By:

I know rdiff-backup. Also very good, but I missed somehow the

ability to configure the way it's maintaining the history. That's why I've decided to write this very simple tool.

I can make this tool do everything, backup previous version in the way  I want it.

Stef

 

 

By: Drew

rsnapshot could be worth looking at … http://rsnapshot.org/

It is unclear to me if you are getting more from your system than rsnapshot provides, but it does versioned snaphot backups.

By: TheFu

Managing a specific number of backups is easy in rdiff-backup:

/usr/bin/rdiff-backup --remove-older-than 90D --force "$TARGET"

 Just to be clear, rdiff and rdiff-backup ARE different tools.

By: Anonymous

as i have use, all rdif option is rsync option, is it right ?

By: falconz

Good info,

 Im doing something simliar to backup my linux and windows machines, however im probably doing it the worst possible way. Im in the process of reviewing it atm.

 

But basicly I rsync to a "full" backup folder where changes are over written then I create hardlinks of those files to another "snapshot" location. using the bellow command

cp -al source destination

 

It works well but I have to manage my own backup intervals etc.. and the disk IO doing the hardlink snapshot is quite high, esp for the 15 servers im using to backup with it. But it doesn mean I get nice incramental backups.

 

Just gave rdiff a go but it seems it doesnt talk to the rsync servers already setup on my windows machines so I may have to re-evaluate it.

By: Rhett

Another option to rdiff is RIBS.  It uses php and rsync to to create incremental backups at the intervals the user configures.

 http://www.rustyparts.com/ribs.php

By: Anonymous

You have some cool ideas here. But I wonder why you elected not to use hard links with the --link-dest option?

e.g., http://www.backupcentral.com/components/com_mambowiki/index.php/Rsync_snapshots

That method seems to better leverage functionality already built into rsync for incremental backups. Just asking ... I like your ideas too.

By: Anonymous

Thanks for the awk and sed extraction commandlines for itemized info.

I use rsync to Solaris zfs filesystem.  Then do a zfs snapshot.  All rsync has to do is get the changed files over.  The rsyncd.conf option "post-xfer exec" is perfect for taking the snapshot when rsync is all done movin the changes.

 PEACE

By: Anonymous

Have you tried rsnapshot - pretty much does this with a simple configuration file.

By: ian fleming

I believe you have a small typo -- a double quote instead of a single quote:

 grep --invert-match "^.f+++++++++" "/srv/backupsimple/log/localhost/$timestamp.onlyfiles" | 
     awk -F '|' '{print $2 }' | sed '[email protected]^/@@" >>
     "/srv/backupsimple/log/localhost/$timestamp.changed"

There is a double-quote at the end of the sed command, but I believe it should be a single quote, to match the single after the word sed

 

By: Erwin

Hello,I am  impressed about this post. Due to my bad English skills I am a little lost.

My plan is to mirroring a folder to a backup folder.

Additionally should the script copy changed and or deleted files to a versions folder.

But i want to keep only 5 or n versions of a changed file. Deleted ones can be kept

 

Which steps do I have to bring together of your writeup? Maybe you can help me a little, please.