Xen Cluster Management With Ganeti On Debian Lenny

Want to support HowtoForge? Become a subscriber!
 
Submitted by falko (Contact Author) (Forums) on Sun, 2009-03-01 19:27. :: Debian | Xen | High-Availability | Virtualization

Xen Cluster Management With Ganeti On Debian Lenny

Version 1.0
Author: Falko Timme <ft [at] falkotimme [dot] com>
Last edited 02/26/2009

Ganeti is a cluster virtualization management system based on Xen. In this tutorial I will explain how to create one virtual Xen machine (called an instance) on a cluster of two physical nodes, and how to manage and failover this instance between the two physical nodes.

This document comes without warranty of any kind! I do not issue any guarantee that this will work for you!

 

1 Preliminary Note

In this tutorial I will use the physical nodes node1.example.com and node2.example.com:

  • node1.example.com: IP address 192.168.0.100; will be the master of the cluster.
  • node2.example.com: IP address 192.168.0.101; will be the primary node of the virtual machine (aka instance).

Both have a 500GB hard drive of which I use 20GB for the / partition, 1GB for swap, and leave the rest unpartitioned so that it can be used by Ganeti (the minimum is 20GB!). Of course, you can change the partitioning to your liking, but remember about the minimum unused space.

The cluster I'm going to create will be named cluster1.example.com, and it will have the IP address 192.168.0.102. The cluster IP 192.168.0.102 will always be bound to the cluster master, so even if you don't know which node is the master, you can use the cluster IP (or the hostname cluster1.example.com) to connect to the master using SSH.

The Xen virtual machine (called an instance in Ganeti speak) will be named inst1.example.com with the IP address 192.168.0.105. inst1.example.com will be mirrored between the two physical nodes using DRBD - you can see this as a kind of network RAID1.

As you see, node1.example.com will be the cluster master, i.e. the machine from which you can control and manage the cluster, and node2.example.com will be the primary node of inst1.example.com, i.e. inst1.example.com will run on node2.example.com (with all changes on inst1.example.com mirrored back to node1.example.com with DRBD) until you fail it over to node1.example.com (if you want to take down node2.example.com for maintenance, for example). This is an active-passive configuration.

I think it's good practice to split up the roles between the two nodes, so that you don't lose the cluster master and the primary node at once should one node go down.

It is important that all hostnames mentioned here should be resolvable to all hosts, which means that they must either exist in DNS, or you must put all hostnames in all /etc/hosts files on all hosts (which is what I will do here).

All cluster nodes must use the same network interface (e.g. eth0). If one node uses eth0 and the other one eth1, then Ganeti won't work correctly anymore.

Ok, let's start...

 

2 Preparing The Physical Nodes

node1:

I want node1 to have the static IP address 192.168.0.100, therefore my /etc/network/interfaces file looks as follows (please note that I replace allow-hotplug eth0 with auto eth0; otherwise restarting the network doesn't work, and we'd have to reboot the whole system):

vi /etc/network/interfaces

# The loopback network interface
auto lo
iface lo inet loopback
# The primary network interface
#allow-hotplug eth0
#iface eth0 inet dhcp
auto eth0
iface eth0 inet static
        address 192.168.0.100
        netmask 255.255.255.0
        network 192.168.0.0
        broadcast 192.168.0.255
        gateway 192.168.0.1

If you've modifed the file, restart your network:

/etc/init.d/networking restart

Then edit /etc/hosts. Make it look like this:

vi /etc/hosts

127.0.0.1       localhost.localdomain   localhost
192.168.0.100   node1.example.com       node1
192.168.0.101   node2.example.com       node2
192.168.0.102   cluster1.example.com    cluster1
192.168.0.105   inst1.example.com       inst1
# The following lines are desirable for IPv6 capable hosts
::1     localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

Next we must make sure that the commands

hostname

and

hostname -f

print out the full hostname (node1.example.com). If you get something different (e.g. just node1), do this:

echo node1.example.com > /etc/hostname
/etc/init.d/hostname.sh start

Afterwards, the hostname commands should show the full hostname.

Then update the system:

aptitude update

aptitude safe-upgrade

node2:

Now we do the same again on node2.example.com (please keep in mind that node2 has a different IP!):

vi /etc/network/interfaces

# The loopback network interface
auto lo
iface lo inet loopback
# The primary network interface
#allow-hotplug eth0
#iface eth0 inet dhcp
auto eth0
iface eth0 inet static
        address 192.168.0.101
        netmask 255.255.255.0
        network 192.168.0.0
        broadcast 192.168.0.255
        gateway 192.168.0.1

/etc/init.d/networking restart

vi /etc/hosts

127.0.0.1       localhost.localdomain   localhost
192.168.0.100   node1.example.com       node1
192.168.0.101   node2.example.com       node2
192.168.0.102   cluster1.example.com    cluster1
192.168.0.105   inst1.example.com       inst1
# The following lines are desirable for IPv6 capable hosts
::1     localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

echo node2.example.com > /etc/hostname
/etc/init.d/hostname.sh start

aptitude update

aptitude safe-upgrade

 

3 Setting Up LVM On The Free HDD Space

node1/node2:

Let's find out about our hard drive:

fdisk -l

node1:~# fdisk -l

Disk /dev/sda: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00023cd1

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          62      497983+  83  Linux
/dev/sda2              63        6141    48829567+  8e  Linux LVM
node1:~#

We will now create the partition /dev/sda3 (on both physical nodes) using the rest of the hard drive and prepare it for LVM:

fdisk /dev/sda

node1:~# fdisk /dev/sda

The number of cylinders for this disk is set to 60801.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help):
 <-- n
Command action
   e   extended
   p   primary partition (1-4)

<-- p
Partition number (1-4): <-- 3
First cylinder (6142-60801, default 6142): <-- ENTER
Using default value 6142
Last cylinder or +size or +sizeM or +sizeK (6142-60801, default 60801):
 <-- ENTER
Using default value 60801

Command (m for help):
 <-- t
Partition number (1-4): <-- 3
Hex code (type L to list codes): <-- L

 0  Empty           1e  Hidden W95 FAT1 80  Old Minix       be  Solaris boot
 1  FAT12           24  NEC DOS         81  Minix / old Lin bf  Solaris
 2  XENIX root      39  Plan 9          82  Linux swap / So c1  DRDOS/sec (FAT-
 3  XENIX usr       3c  PartitionMagic  83  Linux           c4  DRDOS/sec (FAT-
 4  FAT16 <32M      40  Venix 80286     84  OS/2 hidden C:  c6  DRDOS/sec (FAT-
 5  Extended        41  PPC PReP Boot   85  Linux extended  c7  Syrinx
 6  FAT16           42  SFS             86  NTFS volume set da  Non-FS data
 7  HPFS/NTFS       4d  QNX4.x          87  NTFS volume set db  CP/M / CTOS / .
 8  AIX             4e  QNX4.x 2nd part 88  Linux plaintext de  Dell Utility
 9  AIX bootable    4f  QNX4.x 3rd part 8e  Linux LVM       df  BootIt
 a  OS/2 Boot Manag 50  OnTrack DM      93  Amoeba          e1  DOS access
 b  W95 FAT32       51  OnTrack DM6 Aux 94  Amoeba BBT      e3  DOS R/O
 c  W95 FAT32 (LBA) 52  CP/M            9f  BSD/OS          e4  SpeedStor
 e  W95 FAT16 (LBA) 53  OnTrack DM6 Aux a0  IBM Thinkpad hi eb  BeOS fs
 f  W95 Ext'd (LBA) 54  OnTrackDM6      a5  FreeBSD         ee  EFI GPT
10  OPUS            55  EZ-Drive        a6  OpenBSD         ef  EFI (FAT-12/16/
11  Hidden FAT12    56  Golden Bow      a7  NeXTSTEP        f0  Linux/PA-RISC b
12  Compaq diagnost 5c  Priam Edisk     a8  Darwin UFS      f1  SpeedStor
14  Hidden FAT16 <3 61  SpeedStor       a9  NetBSD          f4  SpeedStor
16  Hidden FAT16    63  GNU HURD or Sys ab  Darwin boot     f2  DOS secondary
17  Hidden HPFS/NTF 64  Novell Netware  b7  BSDI fs         fd  Linux raid auto
18  AST SmartSleep  65  Novell Netware  b8  BSDI swap       fe  LANstep
1b  Hidden W95 FAT3 70  DiskSecure Mult bb  Boot Wizard hid ff  BBT
1c  Hidden W95 FAT3 75  PC/IX
Hex code (type L to list codes):
 <-- 8e
Changed system type of partition 3 to 8e (Linux LVM)

Command (m for help):
 <-- w
The partition table has been altered!

Calling ioctl() to re-read partition table.

WARNING: Re-reading the partition table failed with error 16: Device or resource busy.
The kernel still uses the old table.
The new table will be used at the next reboot.
Syncing disks.
node1:~#

Now let's take a look at our hard drive again:

fdisk -l

node1:~# fdisk -l

Disk /dev/sda: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00023cd1

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          62      497983+  83  Linux
/dev/sda2              63        6141    48829567+  8e  Linux LVM
/dev/sda3            6142       60801   439056450   8e  Linux LVM
node1:~#

Looks good. Now we must reboot both physical nodes so that the kernel can read in the new partition table:

reboot

After the reboot, we install LVM (probably it's already installed, but it's better to go sure):

aptitude install lvm2

After the reboot, we prepare /dev/sda3 for LVM on both nodes and add it to the volume group xenvg:

pvcreate /dev/sda3
vgcreate xenvg /dev/sda3

(Ganeti wants to use a volume group of its own, that's why we create xenvg; theoretically we could use an existing volume group with enough unallocated space, but the gnt-cluster verify command will complain about this.)


Please do not use the comment function to ask for help! If you need help, please use our forum.
Comments will be published after administrator approval.