Openfiler 2.99 Active/Passive With Corosync, Pacemaker And DRBD - Page 2

Want to support HowtoForge? Become a subscriber!
 
Submitted by wayner (Contact Author) (Forums) on Fri, 2011-04-22 17:37. ::

4. Prepare everything for the first corosync start

First we are preparing our nodes for a restart for this we disable some services which are handled by corosync at a later point.

root@filer01~# chkconfig --level 2345 openfiler off
root@filer01~# chkconfig --level 2345 nfs-lock off
root@filer01~# chkconfig --level 2345 corosync on

Do the same on the other node:

root@filer02~# chkconfig --level 2345 openfiler off
root@filer02~# chkconfig --level 2345 nfs-lock off
root@filer02~# chkconfig --level 2345 corosync on

Now restart both nodes and check if corosync runs properly in the next part, dont enable drbd as this will be handled by corosync.

 

4.1 Check if corosync started properly

root@filer01~# ps auxf

root@filer01~# ps auxf
root      3480  0.0  0.8 534456  4112 ?        Ssl  19:15   0:00 corosync
root      3486  0.0  0.5  68172  2776 ?        S    19:15   0:00  \_ /usr/lib64/heartbeat/stonith
106       3487  0.0  1.0  67684  4956 ?        S    19:15   0:00  \_ /usr/lib64/heartbeat/cib
root      3488  0.0  0.4  70828  2196 ?        S    19:15   0:00  \_ /usr/lib64/heartbeat/lrmd
106       3489  0.0  0.6  68536  3096 ?        S    19:15   0:00  \_ /usr/lib64/heartbeat/attrd
106       3490  0.0  0.6  69064  3420 ?        S    19:15   0:00  \_ /usr/lib64/heartbeat/pengine
106       3491  0.0  0.7  76764  3488 ?        S    19:15   0:00  \_ /usr/lib64/heartbeat/crmd

root@filer02~# crm_mon --one-shot -V

crm_mon[3602]: 2011/03/24_19:32:07 ERROR: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
crm_mon[3602]: 2011/03/24_19:32:07 ERROR: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
crm_mon[3602]: 2011/03/24_19:32:07 ERROR: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
============
Last updated: Thu Mar 24 19:32:07 2011
Stack: openais
Current DC: filer01 - partition with quorum
Version: 1.1.2-c6b59218ee949eebff30e837ff6f3824ed0ab86b
2 Nodes configured, 2 expected votes
0 Resources configured.
============

Online: [ filer01 filer02 ]

 

4.2 Configure Corosync as following

Now before do monitor the status of starting the cluster on filer02:

root@filer02~# crm_mon

 

4.2.1 Howto configure corosync step by step

root@filer01~# crm configure
crm(live)configure# property stonith-enabled="false"
crm(live)configure# property no-quorum-policy="ignore"

crm(live)configure# rsc_defaults $id="rsc-options" \
> resource-stickiness="100"

crm(live)configure# primitive ClusterIP ocf:heartbeat:IPaddr2 \
params ip="10.10.11.105" cidr_netmask="32" \
op monitor interval="30s"

crm(live)configure# primitive MetaFS ocf:heartbeat:Filesystem \
> params device="/dev/drbd0" directory="/meta" fstype="ext3"

crm(live)configure# primitive lvmdata ocf:heartbeat:LVM \
> params volgrpname="data"

crm(live)configure# primitive drbd_meta ocf:linbit:drbd \
> params drbd_resource="meta" \
> op monitor interval="15s"

crm(live)configure# primitive drbd_data ocf:linbit:drbd \
> params drbd_resource="data" \
> op monitor interval="15s"

crm(live)configure# primitive openfiler lsb:openfiler

crm(live)configure# primitive iscsi lsb:iscsi-target

crm(live)configure# primitive samba lsb:smb

crm(live)configure# primitive nfs lsb:nfs
crm(live)configure# primitive nfs-lock lsb:nfs-lock

crm(live)configure# group g_drbd drbd_meta drbd_data
crm(live)configure# group g_services MetaFS lvmdata openfiler ClusterIP iscsi samba nfs nfs-lock

crm(live)configure# ms ms_g_drbd g_drbd \
> meta master-max="1" master-node-max="1" \
> clone-max="2" clone-node-max="1" \
> notify="true"

crm(live)configure# colocation c_g_services_on_g_drbd inf: g_services ms_g_drbd:Master
crm(live)configure# order o_g_servicesafter_g_drbd inf: ms_g_drbd:promote g_services:start

crm(live)configure# commit

Watch now on the monitor process how the resources all start hopefully.

root@filer01 ~# crm_mon

 

4.2.2 Troubleshooting

If you get any errors because you done commit before the end of the config, then you need to do a cleanup, as in this example:

root@filer01~# crm
crm(live)resource cleanup MetaFS

 

4.2.3 Verify the config

To verify the config:

root@filer01~#crm configure show

node filer01 \
attributes standby="off"
node filer02 \
attributes standby="off"

primitive ClusterIP ocf:heartbeat:IPaddr2 \
params ip="10.10.11.105" cidr_netmask="32" \
op monitor interval="30s"

primitive MetaFS ocf:heartbeat:Filesystem \
params device="/dev/drbd0" directory="/meta" fstype="ext3"

primitive drbd_data ocf:linbit:drbd \
params drbd_resource="data" \
op monitor interval="15s"

primitive drbd_meta ocf:linbit:drbd \
params drbd_resource="meta" \
op monitor interval="15s"

primitive lvmdata ocf:heartbeat:LVM \
params volgrpname="data"

primitive openfiler lsb:openfiler

primitive iscsi lsb:iscsi-target

primitive samba lsb:smb

primitive nfs lsb:nfs
primitive nfs-lock lsb:nfs-lock

group g_drbd drbd_meta drbd_data
group g_services MetaFS lvmdata openfiler ClusterIP iscsi samba nfs nfs-lock

ms ms_g_drbd g_drbd \
meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
colocation c_g_services_on_g_drbd inf: g_services ms_g_drbd:Master
order o_g_services_after_g_drbd inf: ms_g_drbd:promote g_services:start
property $id="cib-bootstrap-options" \
dc-version="1.1.2-c6b59218ee949eebff30e837ff6f3824ed0ab86b" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore" \
last-lrm-refresh="1301801257"
rsc_defaults $id="rsc-options" \
resource-stickiness="100"

 

5. Specify your Setup

Contrary to openfiler 2.3 where you had to manually exchange the haresource file after each change to the services, the config gets exchanged here on whichever node you change it, furthermore you can just modify your setup and remove services from the above setup as it starts all services used by openfiler, you can just start the one you use in the end.


Please do not use the comment function to ask for help! If you need help, please use our forum.
Comments will be published after administrator approval.
Submitted by drinkYourOJ (registered user) on Tue, 2012-09-25 21:33.

Hello,

When I am trying to configure corosync, I get to the following step in the guide and then run into an error:

crm(live)configure# group g_services MetaFS lvmdata openfiler ClusterIP iscsi samba nfs nfslock
ERROR: object lvmdata does not exist

Up until this point, I have had absolutely no problems with anything. DRBD is running great, but I can not put these systems in production without the clustering/HA.

The main difference between my setup and this guide is that I have two separate data stores: one will be an iSCSI target ("vm_store"), the other will be an NFS share ("nfs_data"). So I've simply run every command relating to "data" twice - once for each of my data stores (with my DRBD resource names, etc., in place of the guide's, of course). I don't think that should have any effect that would produce this error.

I am running Openfiler 2.99.2 with all the latest packages (`conary updateall'). Please let me know if there is any more information I should provide, and thank you in advance for your help!

Submitted by Jera (not registered) on Tue, 2012-05-15 11:05.
Hello 

 The tutorial is great and I tried to follow step by step but I didn't get it to work.

 running crm_mon I get the following output:

 Attempting connection to the cluster...

============

Last updated: Tue May 15 10:18:32 2012

Stack: openais

Current DC: cluster1 - partition with quorum

Version: 1.1.2-c6b59218ee949eebff30e837ff6f3824ed0ab86b

2 Nodes configured, 2 expected votes

4 Resources configured.

============

Online: [ cluster2 cluster1 ]

 Resource Group: g_services

lvmdata (ocf::heartbeat:LVM):   Started cluster2

openfiler (lsb:openfiler): Stopped

ClusterIP (ocf::heartbeat:IPaddr2): Stopped

iscsi (lsb:iscsi-target):     Stopped

samba (lsb:smb): Stopped

nfs (lsb:nfs): Stopped

nfslock (lsb:nfslock):   Stopped


 Master/Slave Set: ms_g_drbd

Masters: [ cluster2 ]

Slaves: [ cluster1 ]


Failed actions:

    nfs-lock_start_0 (node=cluster1, call=16, rc=1, status=complete): unknown e

 

 I misspelled the corosync command:

 order o_g_servicesafter_g_drbd inf: ms_g_drbd:promote g_services:start

 I've read in one of the comments that it was supposed to be:

 order o_g_services_after_g_drbd inf: ms_g_drbd:promote g_services:start

how do I correct it?

here is the output of the crm configure show command:

 node cluster1

node cluster2

primitive ClusterIP ocf:heartbeat:IPaddr2 \

params ip="128.1.8.101" cidr_netmask="32" \

op monitor interval="30s"

primitive MetaFS ocf:heartbeat:Filesystem \

params device="/dev/drbd0" directory="/meta" fstype="ext3"

primitive drbd_data ocf:linbit:drbd \

params drbd_resource="data" \

op monitor interval="15s"

primitive drbd_meta ocf:linbit:drbd \

params drbd_resource="meta" \

op monitor interval="15s"

primitive iscsi lsb:iscsi-target

primitive lvmdata ocf:heartbeat:LVM \

params volgrpname="data"

primitive nfs lsb:nfs

primitive nfs-lock lsb:nfslock

primitive nfslock lsb:nfslock

primitive openfiler lsb:openfiler

primitive samba lsb:smb

group g_drbd drbd_meta drbd_data

group g_services lvmdata openfiler ClusterIP iscsi samba nfs nfslock

ms ms_g_drbd g_drbd \

meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"

colocation c_g_services_on_g_drbd inf: g_services ms_g_drbd:Master

order o_g_servicesafter_g_drbd inf: ms_g_drbd:promote g_services:start

property $id="cib-bootstrap-options" \

dc-version="1.1.2-c6b59218ee949eebff30e837ff6f3824ed0ab86b" \

cluster-infrastructure="openais" \

expected-quorum-votes="2" \

stonith-enabled="false" \

no-quorum-policy="ignore"

rsc_defaults $id="rsc-options" \

resource-stickiness="100"

 

also notice that the first two lines of the crm configure show are different from the expected output, I get:

node cluster1

node cluster2

instead of:

 node filer01 \

attributes standby="off"

node filer02 \
attributes standby="off"

Also I can't access the openfiler web interface.  

Thank you in advance

 Jera

Submitted by bpatel (not registered) on Fri, 2014-01-03 18:11.
I'm also having the same issue. Where you able to resolve it?
Submitted by Elias Chatzigeorgiou (not registered) on Thu, 2012-01-05 02:46.
Great tutorial, thanks! A few questions below:

------------------------------------------------------------------------------------
a) How do I know, which node of the OF cluster is currently active?
For example, I use openfiler to provide iSCSI targets to clients.
How can I check if a node is currently in use by the iSCSI-target daemon?

I can try to deactivate a volume group using:
[root@openfiler1 ~]# vgchange -an data
  Can't deactivate volume group "data" with 3 open logical volume(s)		

In which case, if I get a message like the above then I know that
openfiler1 is the active node, but is there a better (non-intrusive)
way to check?

A better option seems to be 'pvs -v'. If the node is active then it shows the volume names:
[root@openfiler1 ~]# pvs -v
    Scanning for physical volume names
  PV         VG      Fmt  Attr PSize   PFree DevSize PV UUID
  /dev/drbd1 data    lvm2 a-   109.99g    0  110.00g c40m9K-tNk8-vTVz-tKix-UGyu-gYXa-gnKYoJ
  /dev/drbd2 tempdb  lvm2 a-    58.00g    0   58.00g 4CTq7I-yxAy-TZbY-TFxa-3alW-f97X-UDlGNP
  /dev/drbd3 distrib lvm2 a-    99.99g    0  100.00g l0DqWG-dR7s-XD2M-3Oek-bAft-d981-UuLReC

where on the inactive node it gives errors:
[root@openfiler2 ~]# pvs -v
    Scanning for physical volume names
  /dev/drbd0: open failed: Wrong medium type
  /dev/drbd1: open failed: Wrong medium type

Any further ideas/comments/suggestions?

------------------------------------------------------------------------------------

b) how can I gracefully failover to the other node ? Up to now, the only way I
know is forcing the active node to reboot (by entering two subsequent 'reboot'
commands). This however breaks the DRBD synchronization, and I need to
use a fix-split-brain procedure to bring back the DRBD in sync.

On the other hand, if I try to stop the corosync service on the active node,
the command takes forever! I understand that the suggested procedure should be
to disconnect all clients from the active node and then stop services,
is it a better approach to shut down the public network interface before
stopping the corosync service (in order to forcibly close client connections)?

Thanks
Submitted by Anonymous (not registered) on Wed, 2012-01-18 11:13.

It should be vgchange -a n data to deactivate a volume.

You can use the following to quickly switch:
crm node standby ; stops the services on FILER01 to test failover to FILER02
crm node online ; to bring the services back online on FILER01

Submitted by neofire (registered user) on Sat, 2011-08-27 14:35.

Great Tutorial

 

I just finished building this in a test environment and i can see it working, couple of questions 

 

on the main openfiler server, can you still use the web interface and is can openfiler be used on the second server ?

Submitted by hale (not registered) on Sun, 2011-08-14 16:31.

Great tutorial I am in the process of configuring it for a production server.

 Have you used the DRBD management consol it looks very easy to use but I have no idea how to setup replication

Submitted by webcycler (registered user) on Thu, 2011-07-21 23:46.

I think in following line is a typo:

crm(live)configure# order o_g_servicesafter_g_drbd inf: ms_g_drbd:promote g_services:start

should be:

crm(live)configure# order o_g_services_after_g_drbd inf: ms_g_drbd:promote g_services:start

(note the underscore between services_after)

Submitted by Anonymous (not registered) on Tue, 2011-06-28 01:16.

some warning messages I received, any one have the same issue?


WARNING: MetaFS: default timeout 20s for start is smaller than the advised 60
WARNING: MetaFS: default timeout 20s for stop is smaller than the advised 60
WARNING: lvmdata: default timeout 20s for start is smaller than the advised 30
WARNING: lvmdata: default timeout 20s for stop is smaller than the advised 30
WARNING: drbd_meta: default timeout 20s for start is smaller than the advised 240
WARNING: drbd_meta: default timeout 20s for stop is smaller than the advised 100
WARNING: drbd_data: default timeout 20s for start is smaller than the advised 240
WARNING: drbd_data: default timeout 20s for stop is smaller than the advised 100

root@filer01 ~# crm configure verify

Submitted by Anonymous (not registered) on Wed, 2012-04-25 19:20.
I have the same problem. I'm not sure where we set the default timeout of 20 seconds though.
Submitted by Anonymous (not registered) on Mon, 2011-06-27 08:52.

fail to get "meta" back after take primary node down and then restart.

 [root@filer01 ~]# service drbd status

Every 2.0s: service drbd status                                                        Wed Jun 22 12:29:37 2011

drbd driver loaded OK; device status:
version: 8.3.10 (api:88/proto:86-96)
GIT-hash: 5c0b0469666682443d4785d90a2c603378f9017b build by phil@fat-tyre, 2011-01-28 12:17:35
m:res   cs            ro                 ds                 p       mounted  fstype
0:meta  Unconfigured
1:data  StandAlone    Secondary/Unknown  UpToDate/DUnknown  r-----



[root@filer01 ~]# service drbd restart

Restarting all DRBD resources: 0: Failure: (104) Can not open backing device.
Command '/sbin/drbdsetup 0 disk /dev/sda3 /dev/sda3 internal --set-defaults --cr                               eate-device --on-io-error=detach' terminated with exit code 10


[root@filer01 ~]# fdisk -l


Disk /dev/sda: 8589 MB, 8589934592 bytes
255 heads, 63 sectors/track, 1044 cylinders, total 16777216 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000ba6a0

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *          63      257039      128488+  83  Linux
/dev/sda2          257040     4450004     2096482+  82  Linux swap / Solaris
/dev/sda3         4450005     5494229      522112+  83  Linux
/dev/sda4         5494230    16771859     5638815   83  Linux

Disk /dev/sdb: 107.4 GB, 107374182400 bytes
255 heads, 63 sectors/track, 13054 cylinders, total 209715200 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000e074f

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1              63   209712509   104856223+  8e  Linux LVM

[root@filer01 ~]# crm_mon --one-shot -V
crm_mon[6797]: 2011/06/22_12:34:07 ERROR: native_add_running: Resource ocf::Filesystem:MetaFS appears to be active on 2 nodes.
crm_mon[6797]: 2011/06/22_12:34:07 WARN: See http://clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more information.
============
Last updated: Wed Jun 22 12:34:07 2011
Stack: openais
Current DC: filer02 - partition with quorum
Version: 1.1.2-c6b59218ee949eebff30e837ff6f3824ed0ab86b
2 Nodes configured, 2 expected votes
2 Resources configured.
============

Online: [ filer01 filer02 ]

 Resource Group: g_services
     MetaFS     (ocf::heartbeat:Filesystem) Started [   filer02 filer01 ]
     lvmdata    (ocf::heartbeat:LVM):   Started filer02 (unmanaged) FAILED
     openfiler  (lsb:openfiler):        Stopped
     ClusterIP  (ocf::heartbeat:IPaddr2):       Stopped
     iscsi      (lsb:iscsi-target):     Stopped
     samba      (lsb:smb):      Stopped
     nfs        (lsb:nfs):      Stopped
 Master/Slave Set: ms_g_drbd
     Masters: [ filer02 ]
     Stopped: [ g_drbd:0 ]

Failed actions:
    lvmdata_stop_0 (node=filer02, call=46, rc=1, status=complete): unknown error
    drbd_meta:0_start_0 (node=filer01, call=12, rc=-2, status=Timed Out): unknown exec error
 

Submitted by yonysoft (registered user) on Fri, 2012-09-14 21:13.

Soy de Argentina.

Lo que hay que hacer es desmontar /meta o en mi caso /dev/sda3, ejecutar el comando para crear meta y reiniciar drbd

Comando:

# umount /dev/sda3

# drbdadm create-md meta

# service drbd restart

con eso ya deberia funcionar.

 

Submitted by webcycler (registered user) on Wed, 2011-07-20 22:12.
Did you perhaps forget to create the /meta directory on one of the filers ?
Submitted by ellisgl (registered user) on Fri, 2011-06-24 11:49.
nfs-lock should be nfslock
Submitted by Anonymous (not registered) on Thu, 2013-02-28 03:05.

I am going to start building this in my lab for HA storage on hyper v hosts. What ip do the clients use to connect to the iscsi target?