How to use snapshots, clones and replication in ZFS on Linux

ZFS Snapshots - an overview

Snapshot is one of the most powerfull features of ZFS, a snapshot provides a read-only, point-in-time copy of a file system or volume that does not consume extra space in the ZFS pool. The snapshot uses only space when the block references are changed. Snapshots preserve disk space by recording only the differences between the current dataset and a previous version.

A typical example use for a snapshot is to have a quick way of backing up the current state of the file system when a risky action like a software installation or a system upgrade is performed.

Creating and Destroying a ZFS Snapshot

Snapshots of volumes can not be accessed directly, but they can be cloned, backed up and rolled back to. Creating and destroying a ZFS snapshot is very easy, we can use zfs snapshot and zfs destroy commands for that.

Create a pool called datapool.

# zpool create datapool mirror /dev/sdb /dev/sdc 
# zpool list
NAME       SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
datapool  1.98G    65K  1.98G         -     0%     0%  1.00x  ONLINE  -

Now, we have a pool called datapool, next we have to create one ZFS filesystem to simulate the snapshot feature.

# zfs create datapool/docs -o mountpoint=/docs
# zfs list -r datapool
NAME            USED  AVAIL  REFER  MOUNTPOINT
datapool       93.5K  1.92G    19K  /datapool
datapool/docs    19K  1.92G    19K  /docs

To create a snapshot of the file system, we can use the zfs snapshot command by specifying the pool and the snapshot name. We can use the -r option if we want to create a snapshot recursively. The snapshot name must satisfy the following naming requirements:

filesystem@snapname
volume@snapname

# zfs snapshot datapool/docs@version1
# zfs list -t snapshot
NAME                     USED  AVAIL  REFER  MOUNTPOINT
datapool/docs@version1      0      -  19.5K  -

A snapshot for datapool/docs is created.

To destroy the snapshot, we can use zfs destroy command as usual.

# zfs destroy datapool/docs@version1
# zfs list -t snapshot
no datasets available

Rolling back a snapshot

For the simulation, we need to create a test file in the /docs directory.

# echo "version 1" > /docs/data.txt
# cat /docs/data.txt
version 1

# zfs snapshot datapool/docs@version1
# zfs list -t snapshot
NAME                     USED  AVAIL  REFER  MOUNTPOINT
datapool/docs@version1     9K      -  19.5K  -

Now we change the content of /docs/data.txt

# echo "version 2" > /docs/data.txt
# cat /docs/data.txt
version 2

We can roll back completely to an older snapshot which will give us the point in time copy at the time snapshot was taken.

# zfs list -t snapshot
NAME                     USED  AVAIL  REFER  MOUNTPOINT
datapool/docs@version1  9.50K      -  19.5K  -
# zfs rollback datapool/docs@version1
# cat /docs/data.txt
version 1

As we can see, the content of data.txt is back to the previous content.

If we want to rename the snapshot, we can use the zfs rename command.

# zfs rename datapool/docs@version1 datapool/docs@version2
# zfs list -t snapshot
NAME                     USED  AVAIL  REFER  MOUNTPOINT
datapool/docs@version2  9.50K      -  19.5K  -

Note: a dataset cannot be destroyed if snapshots of this dataset exist, but we can use the -r option to override that.

# zfs destroy datapool/docs
cannot destroy 'datapool/docs': filesystem has children
use '-r' to destroy the following datasets:
datapool/docs@version2

# zfs destroy -r datapool/docs
# zfs list -t snapshot
no datasets available

Overview of ZFS Clones

A clone is a writable volume or file system whose initial contents are the same as the dataset from which it was created.

Creating and Destroying a ZFS Clone

Clones can only be created from a snapshot and a snapshot can not be deleted until you delete the clone that is based on this snapshot. To create a clone, use the zfs clone command.

# zfs create datapool/docs -o mountpoint=/docs
# zfs list -r datapool
NAME            USED  AVAIL  REFER  MOUNTPOINT
datapool       93.5K  1.92G    19K  /datapool
datapool/docs    19K  1.92G    19K  /docs

# mkdir /docs/folder{1..5}
# ls /docs/
folder1  folder2  folder3  folder4  folder5

# zfs snapshot datapool/docs@today
# zfs list -t snapshot
NAME                  USED  AVAIL  REFER  MOUNTPOINT
datapool/docs@today      0      -    19K  -

Now we create a clone from the snapshot datapool/docs@today

# zfs clone datapool/docs@today datapool/pict
# zfs list
NAME            USED  AVAIL  REFER  MOUNTPOINT
datapool        166K  1.92G    19K  /datapool
datapool/docs    19K  1.92G    19K  /docs
datapool/pict     1K  1.92G    19K  /datapool/pict

The cloning process is finished, the snapshot datapool/docs@today has been cloned to /datapool/pict. When we check the content of the /datapool/pict directory, the content should be same than /datapool/docs.

# ls /datapool/pict
folder1  folder2  folder3  folder4  folder5

After we cloned a snapshot, the snapshot can't be deleted until you delete the dataset.

# zfs destroy datapool/docs@today
cannot destroy 'datapool/docs@today': snapshot has dependent clones
use '-R' to destroy the following datasets:
datapool/pict

# zfs destroy datapool/pict

Finally we can destroy the snapshot.

# zfs destroy datapool/docs@today
# zfs list -t snapshot
no datasets available

Overview of ZFS Replication

The basis for this ZFS replication is a snapshot, we can create a snapshot at any time, and we can create as many snapshots as we like. By continually creating, transferring, and restoring snapshots, you can provide synchronization between one or more machines. ZFS provides a built-in serialization feature that can send a stream representation of the data to standard output.

Configure ZFS Replication

In this section, I want to show you how to replicate a data set from datapool to backuppool, but it is possible to not only store the data on another pool connected to the local system but also to send it over a network to another system. The commands used for replicating data are zfs send and zfs receive.

Create another pool called backuppool.

# zpool create backuppool mirror sde sdf
# zpool list
NAME         SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
backuppool  1.98G    50K  1.98G         -     0%     0%  1.00x  ONLINE  -
datapool    1.98G   568K  1.98G         -     0%     0%  1.00x  ONLINE  -

Check the pool status:

# zpool status
  pool: datapool
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        datapool    ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sdb     ONLINE       0     0     0
            sdc     ONLINE       0     0     0

errors: No known data errors

pool: backuppool
state: ONLINE
scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        backuppool    ONLINE       0     0     0
          mirror-0 ONLINE       0     0     0
            sde     ONLINE       0     0     0
            sdf     ONLINE       0     0     0

errors: No known data errors

Create a dataset that we'll replicate.

# zfs snapshot datapool/docs@today
# zfs list -t snapshot
NAME                  USED  AVAIL  REFER  MOUNTPOINT
datapool/docs@today      0      -    19K  -
# ls /docs/
folder1  folder2  folder3  folder4  folder5

It's time to do the replication.

# zfs send datapool/docs@today | zfs receive backuppool/backup
# zfs list
NAME                USED  AVAIL  REFER  MOUNTPOINT
backuppool           83K  1.92G    19K  /backuppool
backuppool/backup    19K  1.92G    19K  /backuppool/backup
datapool            527K  1.92G    19K  /datapool
datapool/docs        19K  1.92G    19K  /docs

# ls /backuppool/backup
folder1  folder2  folder3  folder4  folder5

The dataset datapool/docs@today has been successfully replicated to backuppool/backup.

To replicate a dataset to another machine, we can use the command below:

# zfs send datapool/docs@today | ssh otherserver zfs recv backuppool/backup

Done.

Conclusion

Snapshot, clone, and replication are the most powerful features of ZFS. Snapshots are used to create point-in-time copies of file systems or volumes, cloning is used to create a duplicate dataset, and replication is used to replicate a dataset from one datapool to another datapool on the same machine or to replicate datapool's between different machines.