Backing Up With rsync And Managing Previous Versions/History
Backing Up With rsync And Managing Previous Versions/HistoryWhen using backup software, most of them use the versatile tool rsync. With this tool it's very easy to sync files and directories on the local or a remote host, and thus creating a copy. But most of them do not manage the history of changed and deleted data. Deleted files are also deleted in the backupcopy, changes are simply overwritten. This howto describes how to keep track of these changed and deleted files. A good rsync command is: rsync --relative --recursive --update --delete --perms --owner --group --times --links --safe-links --super --one-file-system --devices %DirToBackup% %BackupTargetDir%
where %DirToBackup% is the directory to backup, for example a home directory, /home/joe.
Note that this command will create the directories home/joe in the target (because of the --relative) option. Now this command is ok to make a copy, but a real backup is something else. To analyse the backup, there is an option for rsync which is very handy: --dry-run. This will perform the rsync, but rsync will not perform any realaction. In combination with the options --itemize-changes and --out-format this will give you a detailed logreport of the actions that will be taking (deleting, overwriting or creating). For example, if there is no backup yet of the example directory of above, /home/joe in /srv/backupsimple/backup/localhost, and the contents of /home/joe looks like: /home/joe/DocumentA
DocumentB
DocumentC
DocumentD
then the output of the command rsync --dry-run --itemize-changes --out-format="%i|%n|" --relative --recursive --update --delete --perms --owner --group --times --links --safe-links --super --one-file-system --devices /home/joe /srv/backupsimple/backup/localhost is: .d..t......|/home/| cd+++++++++|/home/joe/| >f+++++++++|/home/joe/DocumentA| >f+++++++++|/home/joe/DocumentB| >f+++++++++|/home/joe/DocumentC| >f+++++++++|/home/joe/DocumentD|
Analyzing this: Doing the realbackup: rsync --relative --recursive --update --delete --perms --owner --group --times --links --safe-links --super --one-file-system --devices /home/joe /srv/backupsimple/backup/localhost So now there is copy (or snapshot as you like) in /srv/backupsimple/backup/localhost. Now adding new file is not the point, but changing existing file and/or removing them. Starting with changing files. Changing one of them: echo "new contents" >> /home/joe/DocumentA The dry run rsync command gives: rsync --dry-run --itemize-changes --out-format="%i|%n|" --relative --recursive --update --delete --perms --owner --group --times --links --safe-links --super --one-file-system --devices /home/joe /srv/backupsimple/backup/localhost >f.st......|/home/joe/DocumentA|
Analyzing this:
So, before doing a real backup, the file DocumentA should be backed up first. timestamp=$(date "+%Y-%m-%d %H:%M:%S") This looks like: 2010-04-18 20:55:08 Now create the "history" tree: install --directory "/srv/backupsimple/history/localhost/$timestamp" Note the quotes, they are necessary cause of the space in the timestamp. Write the files to copy to a date based history tree: echo "/home/joe/DocumentA" > /srv/backupsimple/log/localhost/$timestamp.changed The rsync command: rsync --relative --update --perms --owner --group --times --links --super --files-from="/srv/backupsimple/log/localhost/%timestamp.changed" /srv/backupsimple/backup/localhost /srv/backupsimple/history/localhost/$timestamp This will make a backup of the DocumentA file, so now it's safe to run the original rsync command. The file which will be overwritten is copied to a safe place, where it's possible to be looked up later. rsync --relative --recursive --update --delete --perms --owner --group --times --links --safe-links --super --one-file-system --devices /home/joe /srv/backupsimple/backup/localhost So now we have a snapshot of /home/joe, updated at 18 april 2010, at 20:55:08, and a earlier version of /home/joe/DocumentA. With deleted files this is similar: rm /home/joe/DocumentD rsync --dry-run --itemize-changes --out-format="%i|%n|" --relative --recursive --update --delete --perms --owner --group --times --links --safe-links --super --one-file-system --devices /home/joe /srv/backupsimple/backup/localhost .d..t......|/home/joe/|
Analyzing this output: Create first a new timestamp:
timestamp=$(date "+%Y-%m-%d %H:%M:%S") 2010-04-18 20:56:30 Create the history dir: install --directory "/srv/backupsimple/history/localhost/$timestamp" The rsync command to backup the backup is: rsync --relative --update --perms --owner --group --times --links --super --files-from="/srv/backupsimple/log/localhost/%timestamp.deleted" /srv/backupsimple/backup/localhost /srv/backupsimple/history/localhost/$timestamp And again after this command the real rsync command: rsync --relative --recursive --update --delete --perms --owner --group --times --links --safe-links --super --one-file-system --devices /home/joe /srv/backupsimple/backup/localhost
Generalized approachWhen writing a script which does the things described above, things have to be generalized. First set some variables:
DirToBackup=/home/joe install --directory "/srv/backupsimple/history/localhost/$timestamp" Do the dry run and write the output to a file: rsync --dry-run --itemize-changes --out-format="%i|%n|" --relative --recursive --update --delete --perms --owner --group --times --links --safe-links --super --one-file-system --devices $DirToBackup /srv/backupsimple/backup/localhost | sed '/^ *$/d' > "/srv/backupsimple/log/localhost/$timestamp.dryrun" Note: the sed command deletes empty lines. Now when you look at the format of the dryrun file, the created, deleted and changed items are: Created and changed files: grep "^.f" "/srv/backupsimple/log/localhost/$timestamp.dryrun" >>
"/srv/backupsimple/log/localhost/$timestamp.onlyfiles" grep "^.f+++++++++" "/srv/backupsimple/log/localhost/$timestamp.onlyfiles" |
awk -F '|' '{print $2 }' | sed 's@^/@@' >>
"/srv/backupsimple/log/localhost/$timestamp.created" grep --invert-match "^.f+++++++++" "/srv/backupsimple/log/localhost/$timestamp.onlyfiles" | awk -F '|' '{print $2 }' | sed 's@^/@@" >> "/srv/backupsimple/log/localhost/$timestamp.changed"
Some notes: Created and changed directories: grep "^\.d" "/srv/backupsimple/log/localhost/$timestamp.dryrun" |
awk -F '|' '{print $2 }' | sed -e 's@^/@@' -e 's@/$@@' >>
"/srv/backupsimple/log/localhost/$timestamp.changed" grep "^cd" "/srv/backupsimple/log/localhost/$timestamp.dryrun" | awk -F '|' '{print $2 }' | sed -e 's@^/@@' -e 's@/$@@' >> "/srv/backupsimple/log/localhost/$timestamp.created"
Some notes: Deleted files and directories: grep "^*deleting" "/srv/backupsimple/log/localhost/$timestamp.dryrun" | awk -F '|' '{print $2 }' >> "/srv/backupsimple/log/localhost/$timestamp.deleted"
Notes: So now there are the files $timestamp.created, $timestamp.changed and $timestamp.deleted. The file with created items is only here for logging. You cannot and do not have to backup files which are not created yet! Cat the changed and the deleted items together:
cat "/srv/backupsimple/log/localhost/$timestamp.deleted" > /tmp/tmp.rsync.list Now do the backup of the backup: rsync --relative --update --perms --owner --group --times --links --super --files-from=/tmp/rsync.list /srv/backupsimple/backup/localhost/ "/srv/backupsimple/history/localhost/$timestamp" Finally do the real backup: rsync --relative --recursive --update --delete --perms --owner --group --times --links --safe-links --super --one-file-system --devices $DirToBackup /srv/backupsimple/backup/localhost One note, I've copied these commands from a script. There might be some errors, but the idea is clear I hope. Local and remoteAbove is described howto do a backup locally, but it's also very possible to backup to a remote host running a rsync deamon. It requires a more complicated configuration. Not doing the dryrun and the realbackup, they are simple, but it's the step of backing up the backup. The various files with created, changed and deleted items are on the localhost, while this step should be performed on the remote host. There are various ways to solve this. One of them is mounting the remote host with sshfs, and the localhost can do the backup as if it's acting local. A better (imho) sollution is creating an apart "queue" share on the rsync server (besides the backup and the history shares) where the file with the items to be backed up from the backup should be synced to. The rsync server has te ability to run pre and post scripts. When the localhost tries to do the realbackup, a pre script should check there is list there in the queue which should be processed first. If so, it will do this step first. The rsync command on the localhost just will wait till the pre fase is finished.
|



Recent comments
14 hours 14 min ago
19 hours 13 min ago
20 hours 39 min ago
21 hours 33 min ago
23 hours 16 min ago
1 day 3 hours ago
1 day 4 hours ago
1 day 6 hours ago
1 day 19 hours ago
1 day 21 hours ago