Xen Cluster Management With Ganeti On Debian Etch - Page 7
On this page
15 Further Ganeti Commands
To learn more about what you can do with Ganeti, take a look at the following man pages:
man gnt-instance
man gnt-cluster
man gnt-node
man gnt-os
man gnt-backup
man 7 ganeti
man 7 ganeti-os-interface
and also at the Ganeti administrator's guide that comes with the Ganeti package (in /docs/admin.html). The Ganeti installation tutorial also has some hints.
The most interesting commands should be these:
Start an instance:
gnt-instance startup inst1.example.com
Stop an instance:
gnt-instance shutdown inst1.example.com
Go to an instance's console:
gnt-instance console inst1.example.com
Failover an instance to its secondary node:
gnt-instance failover inst1.example.com
Delete an instance:
gnt-instance remove inst1.example.com
Get a list of instances:
gnt-instance list
node1:~# gnt-instance list
Instance OS Primary_node Autostart Status Memory
inst1.example.com debian-etch node2.example.com yes running 64
node1:~#
Get more details about instances:
gnt-instance info
node1:~# gnt-instance info
Instance name: inst1.example.com
State: configured to be up, actual state is up
Nodes:
- primary: node2.example.com
- secondaries: node1.example.com
Operating system: debian-etch
Hardware:
- memory: 64MiB
- NICs: {MAC: aa:00:00:ac:67:3a, IP: None, bridge: xen-br0}
Block devices:
- sda, type: md_raid1, physical_id: a8984725:92a66329:e9453b29:5f438b80
primary: /dev/md0 (9:0) in sync, status ok
- type: drbd, logical_id: ('node2.example.com', 'node1.example.com', 11000)
primary: /dev/drbd0 (147:0) in sync, status ok
secondary: /dev/drbd0 (147:0) in sync, status ok
- type: lvm, logical_id: ('xenvg', '577164fd-b0cb-4043-9d57-aa59f41fddf1.sda_data')
primary: /dev/xenvg/577164fd-b0cb-4043-9d57-aa59f41fddf1.sda_data (253:0)
secondary: /dev/xenvg/577164fd-b0cb-4043-9d57-aa59f41fddf1.sda_data (253:0)
- type: lvm, logical_id: ('xenvg', '22071c7b-37e7-4aa1-be4a-74021599c1a7.sda_meta')
primary: /dev/xenvg/22071c7b-37e7-4aa1-be4a-74021599c1a7.sda_meta (253:1)
secondary: /dev/xenvg/22071c7b-37e7-4aa1-be4a-74021599c1a7.sda_meta (253:1)
- sdb, type: md_raid1, physical_id: 1e974569:29fa6cab:e9453b29:5f438b80
primary: /dev/md1 (9:1) in sync, status ok
- type: drbd, logical_id: ('node2.example.com', 'node1.example.com', 11001)
primary: /dev/drbd1 (147:1) in sync, status ok
secondary: /dev/drbd1 (147:1) in sync, status ok
- type: lvm, logical_id: ('xenvg', 'd89067b9-cae6-4b15-ba3b-76f17f70553e.sdb_data')
primary: /dev/xenvg/d89067b9-cae6-4b15-ba3b-76f17f70553e.sdb_data (253:2)
secondary: /dev/xenvg/d89067b9-cae6-4b15-ba3b-76f17f70553e.sdb_data (253:2)
- type: lvm, logical_id: ('xenvg', 'c17a8468-b3f5-4aa3-8644-0a2c890d68be.sdb_meta')
primary: /dev/xenvg/c17a8468-b3f5-4aa3-8644-0a2c890d68be.sdb_meta (253:3)
secondary: /dev/xenvg/c17a8468-b3f5-4aa3-8644-0a2c890d68be.sdb_meta (253:3)
node1:~#
Get info about a cluster:
gnt-cluster info
node1:~# gnt-cluster info
Cluster name: node1.example.com
Master node: node1.example.com
Architecture (this node): 32bit (i686)
node1:~#
Check if everything is alright with the cluster:
gnt-cluster verify
node1:~# gnt-cluster verify
* Verifying global settings
* Gathering data (2 nodes)
* Verifying node node1.example.com
* Verifying node node2.example.com
* Verifying instance inst1.example.com
* Verifying orphan volumes
* Verifying remaining instances
node1:~#
Find out who's the cluster master:
gnt-cluster getmaster
Failover the master if the master has gone down (fails over the master to the node on which this command is run):
gnt-cluster masterfailover
Find out about instance volumes on the cluster nodes:
gnt-node volumes
node1:~# gnt-node volumes
Node PhysDev VG Name Size Instance
node1.example.com /dev/sda3 xenvg 22071c7b-37e7-4aa1-be4a-74021599c1a7.sda_meta 128 inst1.example.com
node1.example.com /dev/sda3 xenvg 577164fd-b0cb-4043-9d57-aa59f41fddf1.sda_data 10240 inst1.example.com
node1.example.com /dev/sda3 xenvg c17a8468-b3f5-4aa3-8644-0a2c890d68be.sdb_meta 128 inst1.example.com
node1.example.com /dev/sda3 xenvg d89067b9-cae6-4b15-ba3b-76f17f70553e.sdb_data 4096 inst1.example.com
node2.example.com /dev/sda3 xenvg 22071c7b-37e7-4aa1-be4a-74021599c1a7.sda_meta 128 inst1.example.com
node2.example.com /dev/sda3 xenvg 577164fd-b0cb-4043-9d57-aa59f41fddf1.sda_data 10240 inst1.example.com
node2.example.com /dev/sda3 xenvg c17a8468-b3f5-4aa3-8644-0a2c890d68be.sdb_meta 128 inst1.example.com
node2.example.com /dev/sda3 xenvg d89067b9-cae6-4b15-ba3b-76f17f70553e.sdb_data 4096 inst1.example.com
node1:~#
Removing a node from a cluster:
gnt-node remove node2.example.com
Find out about the operating systems supported by the cluster (currently only Debian Etch):
gnt-os list
16 A Failover Example
Now let's assume you want to take down node2.example.com due to maintenance, but you want inst1.example.com to not go down.
First, let's find out about our instances:
node1:
gnt-instance list
As you see, node2 is the primary node:
node1:~# gnt-instance list
Instance OS Primary_node Autostart Status Memory
inst1.example.com debian-etch node2.example.com yes running 64
node1:~#
To failover inst1.example.com to node1, we run the following command (again on node1):
gnt-instance failover inst1.example.com
Afterwards, we run
gnt-instance list
again. node1 should now be the primary node:
node1:~# gnt-instance list
Instance OS Primary_node Autostart Status Memory
inst1.example.com debian-etch node1.example.com yes running 64
node1:~#
Now you can take down node2:
node2:
shutdown -h now
After node2 has gone down, you can try to connect to inst1.example.com - it should still be running.
Now after the maintenance on node2 is finished and we have booted it again, we'd like to make it the primary node again.
Therefore we try a failover on node1 again:
node1:
gnt-instance failover inst1.example.com
This time we get this:
node1:~# gnt-instance failover inst1.example.com
Failover will happen to image inst1.example.com. This requires a
shutdown of the instance. Continue?
y/[n]: <-- y
* checking disk consistency between source and target
Can't get any data from node node2.example.com
Failure: command execution error:
Disk sda is degraded on target node, aborting failover.
node1:~#
The failover doesn't work because inst1.example.com's hard drive on node2 is degraded (i.e., not in sync).
To fix this, we can replace inst1.example.com's disks on node2 by mirroring the disks from the current primary node, node1, to node2:
node1:
gnt-instance replace-disks -n node2.example.com inst1.example.com
During this process (which can take some time) inst1.example.com can stay up.
node1:~# gnt-instance replace-disks -n node2.example.com inst1.example.com
Waiting for instance inst1.example.com to sync disks.
- device sda: 0.47% done, 474386 estimated seconds remaining
- device sdb: 22.51% done, 593 estimated seconds remaining
- device sda: 0.68% done, 157798 estimated seconds remaining
- device sdb: 70.50% done, 242 estimated seconds remaining
- device sda: 0.87% done, 288736 estimated seconds remaining
- device sda: 0.98% done, 225709 estimated seconds remaining
- device sda: 1.10% done, 576135 estimated seconds remaining
- device sda: 1.22% done, 161835 estimated seconds remaining
- device sda: 1.32% done, 739075 estimated seconds remaining
- device sda: 1.53% done, 120064 estimated seconds remaining
- device sda: 1.71% done, 257668 estimated seconds remaining
- device sda: 1.84% done, 257310 estimated seconds remaining
- device sda: 3.43% done, 4831 estimated seconds remaining
- device sda: 6.56% done, 4774 estimated seconds remaining
- device sda: 8.74% done, 4700 estimated seconds remaining
- device sda: 11.20% done, 4595 estimated seconds remaining
- device sda: 13.49% done, 4554 estimated seconds remaining
- device sda: 15.57% done, 4087 estimated seconds remaining
- device sda: 17.49% done, 3758 estimated seconds remaining
- device sda: 19.82% done, 4166 estimated seconds remaining
- device sda: 22.11% done, 4075 estimated seconds remaining
- device sda: 23.94% done, 3651 estimated seconds remaining
- device sda: 26.69% done, 3945 estimated seconds remaining
- device sda: 29.06% done, 3745 estimated seconds remaining
- device sda: 31.07% done, 3567 estimated seconds remaining
- device sda: 33.41% done, 3498 estimated seconds remaining
- device sda: 35.77% done, 3364 estimated seconds remaining
- device sda: 38.05% done, 3274 estimated seconds remaining
- device sda: 41.17% done, 3109 estimated seconds remaining
- device sda: 44.11% done, 2974 estimated seconds remaining
- device sda: 46.21% done, 2655 estimated seconds remaining
- device sda: 48.40% done, 2696 estimated seconds remaining
- device sda: 50.84% done, 2635 estimated seconds remaining
- device sda: 53.33% done, 2449 estimated seconds remaining
- device sda: 55.75% done, 2362 estimated seconds remaining
- device sda: 58.73% done, 2172 estimated seconds remaining
- device sda: 60.91% done, 2015 estimated seconds remaining
- device sda: 63.16% done, 1914 estimated seconds remaining
- device sda: 65.41% done, 1760 estimated seconds remaining
- device sda: 68.15% done, 1681 estimated seconds remaining
- device sda: 70.61% done, 1562 estimated seconds remaining
- device sda: 73.55% done, 1370 estimated seconds remaining
- device sda: 76.01% done, 1269 estimated seconds remaining
- device sda: 78.14% done, 1108 estimated seconds remaining
- device sda: 80.59% done, 1011 estimated seconds remaining
- device sda: 82.86% done, 858 estimated seconds remaining
- device sda: 85.25% done, 674 estimated seconds remaining
- device sda: 87.74% done, 638 estimated seconds remaining
- device sda: 90.01% done, 518 estimated seconds remaining
- device sda: 92.40% done, 392 estimated seconds remaining
- device sda: 94.87% done, 265 estimated seconds remaining
- device sda: 97.10% done, 147 estimated seconds remaining
- device sda: 99.38% done, 30 estimated seconds remaining
Instance inst1.example.com's disks are in sync.
node1:~#
Afterwards, we can failover inst1.example.com to node2:
gnt-instance failover inst1.example.com
node2 should now be the primary again:
gnt-instance list
node1:~# gnt-instance list
Instance OS Primary_node Autostart Status Memory
inst1.example.com debian-etch node2.example.com yes running 64
node1:~#
17 Links
- Ganeti: http://code.google.com/p/ganeti
- Xen: http://xen.xensource.com
- DRBD: http://www.drbd.org
- LVM: http://sourceware.org/lvm2
- Debian: http://www.debian.org