Openfiler 2.3 Active/Passive Cluster (heartbeat,DRBD) With Offsite Replication Node - Page 3

10. Test Recovery Of filer01 And filer02

Now we are going to see what happens if filer01 and filer02 are destroyed due to anything and we have to rebuild from our replication node.

First shutdown filer01 and filer02:

root@filer01 ~# shutdown -h now

root@filer02 ~# shutdown -h now

Now set up two complete new filer01 and filer02 from step 1. to step 3. From there on our recovery will be slightly different to the installation.

10.1 DRBD Configuration

Copy the drbd.conf and lvm.conf file from filer03 to filer01 and filer02:

root@filer03 ~# scp /etc/drbd.conf root@filer01:/etc/drbd.conf
root@filer03 ~# scp /etc/drbd.conf root@filer02:/etc/drbd.conf
root@filer03 ~# scp /etc/lvm/lvm.conf root@filer01:/etc/lvm/lvm.conf
root@filer03 ~# scp /etc/lvm/lvm.conf root@filer02:/etc/lvm/lvm.conf

Initiate the upper resources:

root@filer01 ~# drbdadm create-md meta
root@filer01 ~# drbdadm create-md data

root@filer02 ~# drbdadm create-md meta
root@filer02 ~# drbdadm create-md data

Start DRBD on filer01 and filer02:

root@filer01 ~# service drbd start
root@filer02 ~# service drbd start

Set the upper drbd resources primary on filer01:

root@filer01 ~# drbdsetup /dev/drbd0 primary -o
root@filer01 ~# drbdsetup /dev/drbd1 primary -o

Create the DRBD Metadata on the stacked resource:

root@filer01 ~# drbdadm --stacked create-md meta-U
root@filer01 ~# drbdadm --stacked create-md data-U

Enable the stacked resource:

root@filer01 ~# drbdadm --stacked up meta-U
root@filer01 ~# drbdadm --stacked up data-U

At this point DRBD will recognize the inconsistent data and start to sync from filer03.

root@filer01 ~# service drbd status

service drbd status
drbd driver loaded OK; device status:
version: 8.3.7 (api:88/proto:86-91)
GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by phil@fat-tyre, 2010-01-13 17:17:27
m:res cs ro ds p mounted fstype
... sync'ed: 0.2% (11740/11756)M
... sync'ed: 1.8% (11560/11756)M
... sync'ed: 35.7% (351792/538088)K
... sync'ed: 6.1% (509624/538032)K
0:meta SyncSource Primary/Secondary UpToDate/Inconsistent C
1:data SyncSource Primary/Secondary UpToDate/Inconsistent C
10:meta-U^^0 SyncTarget Secondary/Secondary Inconsistent/UpToDate C
11:data-U^^1 SyncTarget Secondary/Secondary Inconsistent/UpToDate C

For the lower resource meta and data filer01 is the SyncSource, while for the upper resource meta-U and data-U it is the SyncTarget. This shows us that the rebuild process has started.

Before you finish the synchronisation you can prepare the configuration for Openfiler and its Storage Services.

10.2 filer01 And filer02 Redo Configuration

As we have a fresh installation on filer01 and filer02 again we need to redo the configuration for Openfiler on these nodes like we have done it to filer02 and filer03 in the installation process before.

Openfiler Configuration:

mkdir /meta
mv /opt/openfiler/ /opt/openfiler.local
ln -s /meta/opt/openfiler /opt/openfiler

Samba/NFS/ISCSI/PROFTPD Configuration Files to Meta Partition:

service nfslock stop
service nfs stop
service rpcidmapd stop
umount -a -t rpc-pipefs
rm -rf /etc/samba/
ln -s /meta/etc/samba/ /etc/samba
rm -rf /var/spool/samba/
ln -s /meta/var/spool/samba/ /var/spool/samba
rm -rf /var/lib/nfs/
ln -s /meta/var/lib/nfs/ /var/lib/nfs
rm -rf /etc/exports
ln -s /meta/etc/exports /etc/exports
rm /etc/ietd.conf
ln -s /meta/etc/ietd.conf /etc/ietd.conf
rm /etc/initiators.allow
ln -s /meta/etc/initiators.allow /etc/initiators.allow
rm /etc/initiators.deny
ln -s /meta/etc/initiators.deny /etc/initiators.deny
rm -rf /etc/proftpd
ln -s /meta/etc/proftpd/ /etc/proftpd

We need to disable the services that are handled by heartbeat again:

root@filer01 ~# chkconfig --level 2345 heartbeat on
root@filer01 ~# chkconfig --level 2345 drbd on
root@filer01 ~# chkconfig --level 2345 openfiler off
root@filer01 ~# chkconfig --level 2345 open-iscsi off

root@filer02 ~# chkconfig --level 2345 heartbeat on
root@filer02 ~# chkconfig --level 2345 drbd on
root@filer02 ~# chkconfig --level 2345 openfiler off
root@filer02 ~# chkconfig --level 2345 open-iscsi off

10.3 Retake Resources And Run Cluster Again

NOTE : ALL DISCS SHOULD BE IN SYNC AND FILER03 IN STANDBY BEFORE DOING THIS!

When the synchronisation process has finished we can prepare the cluster now for rerunning the services on filer01. If you run the services for the cluster on filer03 ( Step 11.) you have to stop these services like described in Step 11.1 before you can continue.

Set the stacked resource primary on filer01:

root@filer01 ~# drbdadm --stacked primary meta-U
root@filer01 ~# drbdadm --stacked primary data-U

Mount the meta Partition and generate a new haresource file with openfiler:

root@filer01 ~# mount -t ext3 /dev/drbd10 /meta
root@filer01 ~# service openfiler restart

Now login into https://10.10.11.101:446/ and start/stop some service you don't use to regenerate the /etc/ha.d/haresource file.

Then we can copy this file to filer02, start the heartbeat services on both machines and do a takeover.

root@filer01 ~# service openfiler stop
root@filer01 ~# service heartbeat start
root@filer02 ~# service heartbeat start
root@filer01 ~# /usr/lib/heartbeat/hb_takeover

After the network and filesystem mounts have happened you should see everything running fine again under the cluster IP 10.10.11.100.

You can check this by trying to login to https://10.10.11.100:446. Try a manual failover on filer02 now, too.

root@filer02 ~# /usr/lib/heartbeat/hb_takeover

11. Use Replication Node As Main Node

There are scenarios where you want to use the replication node probably for delivering the Storage so you can run services till you recover the hardware for filer01 and filer02. This can even be done when filer01 and filer02 are recovering from filer03.

Initiate the drbd resource as primary and start the partitions:

root@filer03 ~# drbdadm primary meta-U
root@filer03 ~# drbdadm primary data-U
root@filer03 ~# mount -t ext3 /dev/drbd10 /meta
root@filer03 ~# /etc/ha.d/resource.d/LVM data start

At this point we are able to start openfiler and the services we need, but we need the virtual IP which the cluster used to deliver services first. We use the resource.d scripts from heartbeat to do this.

root@filer03 ~# /etc/ha.d/resource.d/IPaddr 10.10.11.100/24/eth0 start

Then start all the services you need on filer03:

root@filer03 ~# service openfiler start

11.1 Finished Replication, How To Turn Replication Node In Standby Again

First disable the services that you started on the machine ( openfiler, iscsi, etc. ):

root@filer03 ~# service openfiler stop

Give up the cluster IP by using the resource.d scripts from heartbeat again.

root@filer03 ~# /etc/ha.d/resource.d/IPaddr 10.10.11.100/24/eth0 stop

Unmount the partitions and bring drbd in secondary mode.

root@filer03 ~# umount /dev/drbd10
root@filer03 ~# /etc/ha.d/resource.d/LVM data stop
root@filer03 ~# drbdadm secondary meta-U
root@filer03 ~# drbdadm secondary data-U

After this you can retake all services from filer01 like you found in Step 10.3.

12. Add Another Storage Partition

12GBs aren't that much so you might want to add more Storage at a later point to your cluster.

This is a very easy process in which you first shutdown the passive nodes and built in your additional storage and then create a LVM Partition on it with fdisk like described in Step 2. Note: You don't need to add another Linux Type Partition for configuration files, only another LVM Partition.

After this you add your new partition to the drbd.conf file on each node.

Add this to the drbd.conf file on filer01 and exchange it to filer02 and filer03.

resource data2 {
 on filer01 {
  device /dev/drbd2;
  disk /dev/sdc1;
  address 10.10.50.101:7790;
  meta-disk internal;
 }
 on filer02 {
  device /dev/drbd2;
  disk /dev/sdc1;
  address 10.10.50.102:7790;
  meta-disk internal;
 }
}
resource data2-U {
 stacked-on-top-of data2 {
  device /dev/drbd12;
  address 10.10.50.100:7790;
 }
 on filer03 {
  device /dev/drbd12;
  disk /dev/sdc1;
  address 10.10.50.103:7790;
  meta-disk internal;
 }
}

Note: filer01 must be the active for this to work!

Create the metadata on the lower resource before we can start the upper resource again.

root@filer01 ~# drbdadm create-md data2

root@filer02 ~# drbdadm create-md data2

Start the lower resource:

root@filer01 ~# drbdadm up data2

root@filer02 ~# drbdadm up data2

Make it primary:

root@filer01 ~# drbdsetup /dev/drbd2 primary -o

Create the upper resource and make it primary, too.

root@filer01 ~# drbdadm --stacked create-md data2-U
root@filer01 ~# drbdadm --stacked up data2-U root@filer01 ~# drbdsetup /dev/drbd12 primary -o

Create the meta-data on filer03 and start the resource:

root@filer03 ~# drbdadm create-md data2-U
root@filer03 ~# drbdadm up data2-U

After this we are ready to add the new device to our existing LVM Device and increase our storage. Note: It's out of scope of this manual to resize the storage that you actually use on it.

Now we create a PV on the new stacked resource device and add it to the existing VolumeGroup:

root@filer01 ~# pvcreate /dev/drbd12
root@filer01 ~# vgextend data /dev/drbd12

Don't forget to add your new device to your heartbeat configuration:

<?xml version="1.0" ?>
<cluster>
<clustering state="on" />
<nodename value="filer01" />
<resource value="MailTo::[email protected]::ClusterFailover"/>
<resource value="IPaddr::10.10.11.100/24/eth0" />
<resource value="IPaddr::10.10.50.100/24/eth1" />
<resource value="drbdupper::meta-U">
<resource value="drbdupper::data-U">
<resource value="drbdupper::data2-U">
<resource value="LVM::data">
<resource value="Filesystem::/dev/drbd10::/meta::ext3::defaults,noatime">
<resource value="MakeMounts"/>
</cluster>

Recreate the /etc/ha.d/haresource like we've done before by restarting some unused service over the Openfiler GUI, exchange this new haresource file to filer02.

After this you can log into your openfiler cluster IP and use the extended data storage. Instead of increasing you could just create another VolumeGroup. Refer to Step 6 for this.

Misc: Openfiler iSCSI Citrix Xen Modifications

Openfiler has some problems with the Storage created by Citrix Xen, so after a reboot you are going to have problems to add and find your LUNs. The main problem for this seems to be the AOE ( ATA over ethernet ) Service, which can be disabled with this command. Do this on all 3 nodes.

chkconfig --level 2345 aoe off

Another problem seems to be in the discovery off LVM Devices with Openfiler, the lvm config i posted is good to use for a system with stacked resources, but probably not right for a drbd only system, the drbd documentations mention the following lvm configurations for drbd and lvm which will only show the drbd or drbd stacked resources to lvm.

filter = [ "a|drbd.*|", "r|.*|" ]

and

filter = [ "a|drbd1[0-9]|", "r|.*|" ]

like you found in this howto. This will allow that the devices /dev/drbd10 - /dev/drbd19 are exposed to lvm. If you need more devices you have to change your lvm configuration regarding to this. You can find the example configurations in the drbd documentation here.

Edit the /etc/rc.sysinit file on Line 333-337 and comment out these lines:

From:

     if [ -x /sbin/lvm.static ]; then
                if /sbin/lvm.static vgscan --mknodes --ignorelockingfailure > /dev/null 2>&1 ; then
                        action $"Setting up Logical Volume Management:" /sbin/lvm.static vgchange -a y --ignorelockingfailure
                fi
        fi

to:

#    if [ -x /sbin/lvm.static ]; then
#                if /sbin/lvm.static vgscan --mknodes --ignorelockingfailure > /dev/null 2>&1 ; then
#                        action $"Setting up Logical Volume Management:" /sbin/lvm.static vgchange -a y --ignorelockingfailure
#                fi
#        fi

Restart your filers now to make the changes happen. You should be fine discovering the iSCSI LUNs with your Citrix Xen systems now.

Misc: Notes About Openfiler Clusters

Not all services are HA with this setup, some original configuration files which can be modified by openfiler remain on the single nodes partitions. In the starting process you can add these files to the meta partition.These services are:

/etc/ldap.conf
/etc/openldap/ldap.conf
/etc/ldap.secret
/etc/nsswitch.conf
/etc/krb5.conf

At the point of writing this howto rpath Linux ( Openfiler is based on this ) has heartbeat version 2.1.3 which would in theory be able to create n+1 clusters, but I haven't found anything about even basic crm cluster configurations being succesfully running. I tried out to create cib.xml files with the onboard script /usr/lib/heartbeat/haresources2cib.py but the cluster did not start with them.

If you finished all steps of this howto succesfully it's time to take one of your favourite drinks, you earned it.