Xen Cluster Management With Ganeti On Debian Etch - Page 7

15 Further Ganeti Commands

To learn more about what you can do with Ganeti, take a look at the following man pages:

man gnt-instance
man gnt-cluster
man gnt-node
man gnt-os
man gnt-backup
man 7 ganeti
man 7 ganeti-os-interface

and also at the Ganeti administrator's guide that comes with the Ganeti package (in /docs/admin.html). The Ganeti installation tutorial also has some hints.

The most interesting commands should be these:

Start an instance:

gnt-instance startup inst1.example.com

Stop an instance:

gnt-instance shutdown inst1.example.com

Go to an instance's console:

gnt-instance console inst1.example.com

Failover an instance to its secondary node:

gnt-instance failover inst1.example.com

Delete an instance:

gnt-instance remove inst1.example.com

Get a list of instances:

gnt-instance list
node1:~# gnt-instance list
Instance          OS          Primary_node      Autostart Status  Memory
inst1.example.com debian-etch node2.example.com yes       running     64
node1:~#

Get more details about instances:

gnt-instance info
node1:~# gnt-instance info
Instance name: inst1.example.com
State: configured to be up, actual state is up
  Nodes:
    - primary: node2.example.com
    - secondaries: node1.example.com
  Operating system: debian-etch
  Hardware:
    - memory: 64MiB
    - NICs: {MAC: aa:00:00:ac:67:3a, IP: None, bridge: xen-br0}
  Block devices:
    - sda, type: md_raid1, physical_id: a8984725:92a66329:e9453b29:5f438b80
      primary:   /dev/md0 (9:0) in sync, status ok
      - type: drbd, logical_id: ('node2.example.com', 'node1.example.com', 11000)
        primary:   /dev/drbd0 (147:0) in sync, status ok
        secondary: /dev/drbd0 (147:0) in sync, status ok
        - type: lvm, logical_id: ('xenvg', '577164fd-b0cb-4043-9d57-aa59f41fddf1.sda_data')
          primary:   /dev/xenvg/577164fd-b0cb-4043-9d57-aa59f41fddf1.sda_data (253:0)
          secondary: /dev/xenvg/577164fd-b0cb-4043-9d57-aa59f41fddf1.sda_data (253:0)
        - type: lvm, logical_id: ('xenvg', '22071c7b-37e7-4aa1-be4a-74021599c1a7.sda_meta')
          primary:   /dev/xenvg/22071c7b-37e7-4aa1-be4a-74021599c1a7.sda_meta (253:1)
          secondary: /dev/xenvg/22071c7b-37e7-4aa1-be4a-74021599c1a7.sda_meta (253:1)
    - sdb, type: md_raid1, physical_id: 1e974569:29fa6cab:e9453b29:5f438b80
      primary:   /dev/md1 (9:1) in sync, status ok
      - type: drbd, logical_id: ('node2.example.com', 'node1.example.com', 11001)
        primary:   /dev/drbd1 (147:1) in sync, status ok
        secondary: /dev/drbd1 (147:1) in sync, status ok
        - type: lvm, logical_id: ('xenvg', 'd89067b9-cae6-4b15-ba3b-76f17f70553e.sdb_data')
          primary:   /dev/xenvg/d89067b9-cae6-4b15-ba3b-76f17f70553e.sdb_data (253:2)
          secondary: /dev/xenvg/d89067b9-cae6-4b15-ba3b-76f17f70553e.sdb_data (253:2)
        - type: lvm, logical_id: ('xenvg', 'c17a8468-b3f5-4aa3-8644-0a2c890d68be.sdb_meta')
          primary:   /dev/xenvg/c17a8468-b3f5-4aa3-8644-0a2c890d68be.sdb_meta (253:3)
          secondary: /dev/xenvg/c17a8468-b3f5-4aa3-8644-0a2c890d68be.sdb_meta (253:3)
node1:~#

Get info about a cluster:

gnt-cluster info
node1:~# gnt-cluster info
Cluster name: node1.example.com
Master node: node1.example.com
Architecture (this node): 32bit (i686)
node1:~#

Check if everything is alright with the cluster:

gnt-cluster verify
node1:~# gnt-cluster verify
* Verifying global settings
* Gathering data (2 nodes)
* Verifying node node1.example.com
* Verifying node node2.example.com
* Verifying instance inst1.example.com
* Verifying orphan volumes
* Verifying remaining instances
node1:~#

Find out who's the cluster master:

gnt-cluster getmaster

Failover the master if the master has gone down (fails over the master to the node on which this command is run):

gnt-cluster masterfailover

Find out about instance volumes on the cluster nodes:

gnt-node volumes
node1:~# gnt-node volumes
Node              PhysDev   VG    Name                                           Size Instance
node1.example.com /dev/sda3 xenvg 22071c7b-37e7-4aa1-be4a-74021599c1a7.sda_meta   128 inst1.example.com
node1.example.com /dev/sda3 xenvg 577164fd-b0cb-4043-9d57-aa59f41fddf1.sda_data 10240 inst1.example.com
node1.example.com /dev/sda3 xenvg c17a8468-b3f5-4aa3-8644-0a2c890d68be.sdb_meta   128 inst1.example.com
node1.example.com /dev/sda3 xenvg d89067b9-cae6-4b15-ba3b-76f17f70553e.sdb_data  4096 inst1.example.com
node2.example.com /dev/sda3 xenvg 22071c7b-37e7-4aa1-be4a-74021599c1a7.sda_meta   128 inst1.example.com
node2.example.com /dev/sda3 xenvg 577164fd-b0cb-4043-9d57-aa59f41fddf1.sda_data 10240 inst1.example.com
node2.example.com /dev/sda3 xenvg c17a8468-b3f5-4aa3-8644-0a2c890d68be.sdb_meta   128 inst1.example.com
node2.example.com /dev/sda3 xenvg d89067b9-cae6-4b15-ba3b-76f17f70553e.sdb_data  4096 inst1.example.com
node1:~#

Removing a node from a cluster:

gnt-node remove node2.example.com

Find out about the operating systems supported by the cluster (currently only Debian Etch):

gnt-os list

 

16 A Failover Example

Now let's assume you want to take down node2.example.com due to maintenance, but you want inst1.example.com to not go down.

First, let's find out about our instances:

node1:

gnt-instance list

As you see, node2 is the primary node:

node1:~# gnt-instance list
Instance          OS          Primary_node      Autostart Status  Memory
inst1.example.com debian-etch node2.example.com yes       running     64
node1:~#

To failover inst1.example.com to node1, we run the following command (again on node1):

gnt-instance failover inst1.example.com

Afterwards, we run

gnt-instance list

again. node1 should now be the primary node:

node1:~# gnt-instance list
Instance          OS          Primary_node      Autostart Status  Memory
inst1.example.com debian-etch node1.example.com yes       running     64
node1:~#

Now you can take down node2:

node2:

shutdown -h now

After node2 has gone down, you can try to connect to inst1.example.com - it should still be running.

Now after the maintenance on node2 is finished and we have booted it again, we'd like to make it the primary node again.

Therefore we try a failover on node1 again:

node1:

gnt-instance failover inst1.example.com

This time we get this:

node1:~# gnt-instance failover inst1.example.com
Failover will happen to image inst1.example.com. This requires a
shutdown of the instance. Continue?
y/[n]:
<-- y
* checking disk consistency between source and target
Can't get any data from node node2.example.com
Failure: command execution error:
Disk sda is degraded on target node, aborting failover.
node1:~#

The failover doesn't work because inst1.example.com's hard drive on node2 is degraded (i.e., not in sync).

To fix this, we can replace inst1.example.com's disks on node2 by mirroring the disks from the current primary node, node1, to node2:

node1:

gnt-instance replace-disks -n node2.example.com inst1.example.com

During this process (which can take some time) inst1.example.com can stay up.

node1:~# gnt-instance replace-disks -n node2.example.com inst1.example.com
Waiting for instance inst1.example.com to sync disks.
- device sda: 0.47% done, 474386 estimated seconds remaining
- device sdb: 22.51% done, 593 estimated seconds remaining
- device sda: 0.68% done, 157798 estimated seconds remaining
- device sdb: 70.50% done, 242 estimated seconds remaining
- device sda: 0.87% done, 288736 estimated seconds remaining
- device sda: 0.98% done, 225709 estimated seconds remaining
- device sda: 1.10% done, 576135 estimated seconds remaining
- device sda: 1.22% done, 161835 estimated seconds remaining
- device sda: 1.32% done, 739075 estimated seconds remaining
- device sda: 1.53% done, 120064 estimated seconds remaining
- device sda: 1.71% done, 257668 estimated seconds remaining
- device sda: 1.84% done, 257310 estimated seconds remaining
- device sda: 3.43% done, 4831 estimated seconds remaining
- device sda: 6.56% done, 4774 estimated seconds remaining
- device sda: 8.74% done, 4700 estimated seconds remaining
- device sda: 11.20% done, 4595 estimated seconds remaining
- device sda: 13.49% done, 4554 estimated seconds remaining
- device sda: 15.57% done, 4087 estimated seconds remaining
- device sda: 17.49% done, 3758 estimated seconds remaining
- device sda: 19.82% done, 4166 estimated seconds remaining
- device sda: 22.11% done, 4075 estimated seconds remaining
- device sda: 23.94% done, 3651 estimated seconds remaining
- device sda: 26.69% done, 3945 estimated seconds remaining
- device sda: 29.06% done, 3745 estimated seconds remaining
- device sda: 31.07% done, 3567 estimated seconds remaining
- device sda: 33.41% done, 3498 estimated seconds remaining
- device sda: 35.77% done, 3364 estimated seconds remaining
- device sda: 38.05% done, 3274 estimated seconds remaining
- device sda: 41.17% done, 3109 estimated seconds remaining
- device sda: 44.11% done, 2974 estimated seconds remaining
- device sda: 46.21% done, 2655 estimated seconds remaining
- device sda: 48.40% done, 2696 estimated seconds remaining
- device sda: 50.84% done, 2635 estimated seconds remaining
- device sda: 53.33% done, 2449 estimated seconds remaining
- device sda: 55.75% done, 2362 estimated seconds remaining
- device sda: 58.73% done, 2172 estimated seconds remaining
- device sda: 60.91% done, 2015 estimated seconds remaining
- device sda: 63.16% done, 1914 estimated seconds remaining
- device sda: 65.41% done, 1760 estimated seconds remaining
- device sda: 68.15% done, 1681 estimated seconds remaining
- device sda: 70.61% done, 1562 estimated seconds remaining
- device sda: 73.55% done, 1370 estimated seconds remaining
- device sda: 76.01% done, 1269 estimated seconds remaining
- device sda: 78.14% done, 1108 estimated seconds remaining
- device sda: 80.59% done, 1011 estimated seconds remaining
- device sda: 82.86% done, 858 estimated seconds remaining
- device sda: 85.25% done, 674 estimated seconds remaining
- device sda: 87.74% done, 638 estimated seconds remaining
- device sda: 90.01% done, 518 estimated seconds remaining
- device sda: 92.40% done, 392 estimated seconds remaining
- device sda: 94.87% done, 265 estimated seconds remaining
- device sda: 97.10% done, 147 estimated seconds remaining
- device sda: 99.38% done, 30 estimated seconds remaining
Instance inst1.example.com's disks are in sync.
node1:~#

Afterwards, we can failover inst1.example.com to node2:

gnt-instance failover inst1.example.com

node2 should now be the primary again:

gnt-instance list
node1:~# gnt-instance list
Instance          OS          Primary_node      Autostart Status  Memory
inst1.example.com debian-etch node2.example.com yes       running     64
node1:~#

 

Share this page:

0 Comment(s)