HowtoForge Forums | HowtoForge - Linux Howtos and Tutorials

HowtoForge Forums | HowtoForge - Linux Howtos and Tutorials (
-   HOWTO-Related Questions (
-   -   VMWare replication and failover (

sebastienp 5th August 2008 14:39

VMWare replication and failover
OK, accuracy :

I have no problem with vm1 when started on srv1 : it gets its IP ( staticaly configured), I can access it.
But when I disconnect srv1, even if the instance goes online on srv2, vm1 over srv2 doesn't get any IP, as far as eth0 doesn't exists anymore on srv2.

Is this normal ?
Do someone have a clue ?

Thank you in advance,


Hi there,

Once again many thanks for the time you spent achieving these howtos. It helps a lot !!!

Sorry to burden, but I have questions regarding the "Virtual Machine Replication & Failover with VMWare Server & Debian Etch (4.0)" howto.

It looks like I missed something...

OK, I have 2 physical nodes:
eth0 : - eth1 : (heartbeat)
eth0 : - eth1 : (heartbeat)

DRBD and Heartbeat are working well.
#srv1:~# cat /proc/drbd
#version: 0.7.21 (api:79/proto:74)
#SVN Revision: 2326 build by, 2008-07-22 22:14:19
# 0: cs:Connected st:Primary/Secondary ld:Consistent
# ns:2236 nr:0 dw:100 dr:2237 al:0 bm:27 lo:0 pe:0 ua:0 ap:0
#srv1:~# /etc/init.d/heartbeat status
#heartbeat OK [pid 2645 et al] is running on #[]...

Here are the config files:

*drbd.conf :
resource vm1 {
protocol C;
incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt -f";
startup {
wfc-timeout 10;
degr-wfc-timeout 30;
disk {
on-io-error detach;
net {
max-buffers 20000;
unplug-watermark 12000;
max-epoch-size 20000;
syncer {
rate 500M;
group 1;
al-extents 257;
on {
device /dev/drbd0;
disk /dev/cciss/c0d0p7;
meta-disk internal;
on {
device /dev/drbd0;
disk /dev/cciss/c0d0p7;
meta-disk internal;

* :
logfile /var/log/ha-log
gfile /var/log/ha-log
logfacility local0
keepalive 1
deadtime 10
warntime 10
udpport 694
bcast eth1

logfacility local0
keepalive 1
deadtime 10
warntime 10
udpport 694
bcast eth1
auto_failback on
respawn hacluster /usr/lib/heartbeat/ipfail

*authkeys :
auth 1
1 md5 secret

*haresources : drbddisk::vm1 Filesystem::/dev/drbd0::/var/vm::ext3 vmstart

vmstart points to the correct files in /var/vm.

VMWare server v.1.0.5 is installed and working on both servers, and the VMWare instance is created on srv1.
Hosts are declared in /etc/hosts.

What I understood was when booting vm1, it will get the IP address ( for instance) configured in haresources.

But when I boot vm1, it gets an IP via DHCP.
I can access its services via this IP, but I don't have failover.
When disconnecting srv1, the instance goes online on srv2, but eth0 doesn't exists anymore ! It is declared in /etc/network/interfaces as dhcp but it's not up.
Trying ifup eth0, I have :
SIOCSIFADDR: No such device
eth0: ERROR while getting interface flags: No such device (twice)
Bind socket to interface: No such device
Failed to bring up eth0

If I set another IP staticaly on vm1 (let's say 1.20), I don't have failover since I loose 1.20 as soon as I disconnect srv1... even if the VM switches to srv2, with 1.10 IP !
Once again, eth0 disappears.

If I set the haresources's IP statically on vm1 (iface eth0 inet static address ...),
then I access srv1 (or srv2, depending which server holds the eth0:0...) instead of vm1.

Could you please be so kind to explain with more details what sould theorically happend ?
What if I want to configure several virtual machines ?
Did I miss something ? Did I misunderstood ?

Many thanks for your support,

thanis 8th August 2008 23:28

Hi, please keep in mind that the entire configuration of heartbeat and drbd does NOT have anything to do with the virtual machines. The haresources IP address is the heartbeat IP address if configured correctly. NIC configuration for your virtual machines is done on the virtual machines, basically, whenever you talk about your vm's, it is all VMWare related and no longer tied to the HA part of the tutorial.

So , what kind of OS are you running in your VM and are the VMWare tools installed ?


sebastienp 9th August 2008 03:22

to be continued
Hi Thanis, first of all many thanks for your answer, maybe you're on vacations... Nice from you to take the time.

OK, I worked a lot since last post, and ' got it, for sure, heartbeat and drbd are "by themselves". Commited.

I now use vmware server 1.0.6 instead of 1.0.5, just in case...
But same shit...
Downgrading to 1.0.2, why not, it's my last chance !?

Nevertheless, the mes is with ethernet card (these are HP servers, according to HP full compatible with debian etch. I don't know for vmware).

No problem with the primary server (srv1). My vms bind AMD pcnet card as eth0 with its IP staticaly configured.

But when moving to srv2, I always have the same error:
SIOCSIFADDR: No such device
eth0: ERROR while getting interface flags: No such device (twice)
Bind socket to interface: No such device
Failed to bring up eth0 boot sequence, and no eth0 available.

ifconfig -a shows that eth0 doesn't exists, but eth1 is there ! (It's not there on srv1...).

I read several things about /etc/udev/rules.d/z25_persistent-net.rules.
I tried to remove/tune it, successless.

I want to use several (2-4) Linux vms and only 1 Windows 2003 server vm (specific purpose/service). How many did you try in your lab ? Which OS ?

FYI, the problem is there with only Linux vms, and also with only the Windows vm. Gods love us ;-)

Once again, no problem on srv1.
But as soon as I disconnect it, srv2 holds back vmware instance, OK, but it's just like eth0 vanished !

That was my first try, I used a bridged network configuration (what kind of vmware network configuration did you use for your test lab ?).

Now, I'm trying with a NAT config, I have a couple of possibilities.
- tuning nat.conf for vmnet8 on both servers, so that they share the same "NATed network". What about MAC addresses ? They are the same for vmnet8. To be tested;
- using a tunnel broker solution, but I'm not familiar to IPv6. To be tested;

I'm still working on it...

I can say I never saw something explicit in the log files, except the NIC failure... It's the main problem !

Sorry to ask, but could you please send a basic sketch of your topology when you did it ?
I find your howto very interesting and knwoledgefull, but if you permit, not so detailled considering the topic, even for linux/vmware users.

That's easy to say for me cause I never posted an howto, but I promise, if I succeed in doing this one, I sware I'll post something !

I'll keep you updated, once again many thanks for your time.


thanis 11th August 2008 20:43

Hi Sebastien,

Could it be that your second server is connected differently ? I have tried to recreate your issue, and get the same problem if for example that in Server1, nic0 is connected/installed ==> eth0, but in Server2, nic1 is connected/installed, and then VMWare will have a different physical nic ID to bridge. In this case, the virtual nic's are also different and that is why your vmware nic is eth0 on Server1, but eth1 on server2. Since you also have the same issue with Windows, you can be pretty sure it is related to VMWare, so it kind of falls outside of the scope.

Other news: I will create a newer/bigger howto soon using the latest VMWare with the latest DRBD. I will also try to get the active/active mode of drbd up & running.


sebastienp 12th August 2008 17:11

VMWare replication and failover
Hi Thanis,

Thanks for your quick answer.

Both servers (same model) are identically configured :
- eth0 : LAN for srv1, for srv2;
- eth1 : DRBD/Heartbeat for srv1, for srv2;

When you did the lab to reproduce my problem, did you succeed in achieving the howto without issues ?

Great news is the new improved how-to version !!!

Once again thanks for your time,
Kind regards from Paris/France,


thanis 12th August 2008 20:47

Hi sebastien, of course I had no problems with the howto, I wrote it myself :) But I really stress that your problem is VMWare config related, I have no clue as to why you are having this issue without seeing your actual environment. I think that the vmware config on the second server is bridged to the wrong NIC, but like I say, I cannot be sure at all.

Perhaps we should wait for the other thread of Bart Van Kleef, to see if he has the same issue as you do.


sebastienp 12th August 2008 21:08

VMWare replication and failover
Hi Thanis,

Of course, you did it so it worked for you...

But, which versions (vmware server 1.0.2 ? drbd 0.7 OK, heartbeat package ?) are you using ?
What kind of VMware network config did you use ?

Another question, regarding vmstart script, second line case "$1" , do I have to name case "$2" if I add another vm ?

I don't want to abuse, so don't hesitate to throw me over the window if you're feeling I'm doing it, but it's possible to grant you an ssh access to the cluster, if you want.

Thank you again and again,

thanis 12th August 2008 21:43

Hi sebastien, let's wait for Bart first and then we'll see. If I could have ssh access if all else fails that would be great :)

The "case" statements are for the start/stop/status arguments.


ipguru99 21st February 2009 04:34

No srv1-eth0, srv2-eth0, too
First, Great article. Second, I hope someone is still listening to this thread...

I have a customer that wants to do something like this, but not spend $30k getting it going with VMWare Fusion. We saw this and HAD to try it.

I have the exact same problem as sebastienp? SRV1 works great, everything fails over so fast it's unbelievable (easy when you realize what is going on).

The vm moves over to SRV2 when I pull the eth0 cable out of SRV1.. but no eth0 on SRV2. I even started over and created a new vm on SRV2.. same thing when I fail it back to SRV1... everything moves over, but no eth0. The /etc/network/interfaces file says there is an eth0.. but when trying to bring it up manually, the vm just says there isn't an eth0. I fail it back to where ever I created the original vm and the eth0 is fine and accessible.

I am using old Gateway pc's as a test, but they are identical. I have a different set of cards for eth1 (SRV1 has a 3c905 and SRV2 has a Digital).. so I don't think that is causing anything.

Anyone ever get this resolved?


christr 10th March 2009 22:32

Did you try to manually bring up the VM on the other host by any chance? Using the VMWare Console? If so, did you specify 'keep' or 'create' when you tried to bring up the 'copy' on the other host? If you selected 'create' then the virtual MAC addresses of the ethernet cards changed and that may be causing your issue. You must select 'keep' to keep the virtual machine ID (and all virtual NIC mac addresses) identical or Linux will think it's a new interface. Hence the eth1 designation now. (it still knows about eth0 having mac address X, so it adds eth1 with the new mac address).

I've had this issue a few times myself so what I do on the vmware server 'clusters' I build with this is I move one of the vm's by hand to the other server (by killing heartbeat) while I have the vmware console up and running. I shutdown the vm, kill heartbeat (to make everything move) then manually 'start' the VM on the 2nd box. That time I get the prompt about creating/keeping/always create/always keep. Pick always keep and you should be OK.

Any questions fire away... i'm actually in process of building a new pair of servers this week using DRBD 8.3 & VMWare Server 2.0. (need a bunch of changes from the howto to get it to work but not too bad so far).


All times are GMT +2. The time now is 03:42.

Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2014, vBulletin Solutions, Inc.