HowtoForge Forums | HowtoForge - Linux Howtos and Tutorials

HowtoForge Forums | HowtoForge - Linux Howtos and Tutorials (http://www.howtoforge.com/forums/index.php)
-   HOWTO-Related Questions (http://www.howtoforge.com/forums/forumdisplay.php?f=2)
-   -   Load-Balanced MySQL Cluster Error with Cluster (http://www.howtoforge.com/forums/showthread.php?t=7074)

stylez 25th September 2006 21:43

Load-Balanced MySQL Cluster Error with Cluster
 
Hello,

I've followed the tutorial on how to setup a load-balanced Mysql Cluster and everything seems to be working fine but just recently as I checked up on the services, one of the mysql-cluster isn't being recognized by ndb_mgm app. I've had this problem twice before and I thought I misconfigured it and reinstall the whole system on VM's, I thought I solved it but it seems to be reoccuring after a few days of completing the setup.

Here is my configuration for the 5 machines: (note all VMs)

sql-1 172.30.0.7 (runs ndbd and mysql)
sq-2 172.30.0.8 (runs ndbd and mysql)
loadb-1 172.30.0.110 (runs lb1 and ndb_mgm) [active]
loadb-2 172.30.0.9 (runs lb2) [passive]

virtual IP for cluster: 172.30.0.111

I can ping the virtual IP, I can access the mysql db's from 0.7 and 0.8 but when I try from 0.111, I get an error trying to connect.

Here's the output from show in ndb_mgm

Cluster Configuration
---------------------
[ndbd(NDB)] 2 node(s)
id=2 @172.30.0.7 (Version: 4.1.21, Nodegroup: 0, Master)
id=3 @172.30.0.8 (Version: 4.1.21, Nodegroup: 0)

[ndb_mgmd(MGM)] 1 node(s)
id=1 @172.30.0.110 (Version: 4.1.21)

[mysqld(API)] 2 node(s)
id=4 (not connected, accepting connect from any host)
id=5 @172.30.0.8 (Version: 4.1.21)


I've restarted mysql on 0.7 and it seems to run fine, but ndb_mgm doesn't see it and even so, 0.8 is running it fine but I still can't connect. Everything worked last week when I completed the setup and I don't know what else I could do to check what may be erroring so that the cluster isn't working. Loadb-1 is the active load-balancers and it should direct the db to sql-2 but it doesn't seem to. I ran all the checks found on http://www.howtoforge.com/loadbalanc...ster_debian_p8 and it all checks out fine and the active loadb-1 has the ip 172.30.0.111 as the virutal. If anyone has experience this or could shed some light on what I might be doing wrong that would be great. As I said, everything work 100% when I completed the inital install and I even tested when a single cluster and load balancer would go down, and it worked as the tutorial stated.

falko 26th September 2006 18:04

Can you run the tests from http://www.howtoforge.com/loadbalanc...ster_debian_p8 and post the results here? Also, are there any errors in the logs?

stylez 26th September 2006 19:01

Command "ip addr sh eth0"

loadb-1:
2: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
link/ether 00:0c:29:a7:30:cf brd ff:ff:ff:ff:ff:ff
inet 172.30.0.110/24 brd 172.30.0.255 scope global eth0
inet 172.30.0.111/24 brd 172.30.0.255 scope global secondary eth0

loadb-2
2: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
link/ether 00:0c:29:1f:46:fd brd ff:ff:ff:ff:ff:ff
inet 172.30.0.9/24 brd 172.30.0.255 scope global eth0


Command "ldirectord ldirectord.cf status"

loadb-1:
ldirectord for /etc/ha.d/ldirectord.cf is running with pid: 919


loadb-2:
ldirectord is stopped for /etc/ha.d/ldirectord.cf

Command: "

loadb-1: "ipvsadm -L -n"
IP Virtual Server version 1.0.11 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 172.30.0.111:3306 wrr
-> 172.30.0.8:3306 Route 0 0 0
-> 172.30.0.7:3306 Route 0 0 0

loadb-2:
IP Virtual Server version 1.0.11 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn


Command: "/etc/ha.d/resource.d/LVSSyncDaemonSwap master status"

loadb-1:
master running
(ipvs_syncmaster pid: 1046)


loadb-2:
master stopped



Everything seems to check out but I'm still unable to connect. When I first installed the app and tested ndb_mgm, both NDB's show up, ndb MGM shows up and so does both MYSQLD. Now when I run a show all I get this the following:

[ndbd(NDB)] 2 node(s)
id=2 @172.30.0.7 (Version: 4.1.21, Nodegroup: 0)
id=3 @172.30.0.8 (Version: 4.1.21, Nodegroup: 0, Master)

[ndb_mgmd(MGM)] 1 node(s)
id=1 @172.30.0.110 (Version: 4.1.21)

[mysqld(API)] 2 node(s)
id=4 @172.30.0.8 (Version: 4.1.21)
id=5 (not connected, accepting connect from any host)


You can see that 172.30.0.7 mysqld isn't showing up, but it's running on 0.7 and I can access the mysql directly from it.

falko 27th September 2006 20:58

What's the output of
Code:

netstat -tap
and
Code:

df -h
on 172.30.0.7? Are there any errors in the logs on 172.30.0.7?

stylez 27th September 2006 21:02

sql-1:~# netstat -tap
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 *:mysql *:* LISTEN 27158/mysqld
tcp 0 0 *:www *:* LISTEN 813/apache2
tcp 0 0 *:ssh *:* LISTEN 800/sshd
tcp 0 0 sql-1.localdomain:2202 *:* LISTEN 27099/ndbd
tcp 0 0 sql-1.localdomain:35463 172.30.0.110:1186 ESTABLISHED27098/ndbd
tcp 0 0 sql-1.localdomain:35466 172.30.0.110:1186 ESTABLISHED27158/mysqld
tcp 0 0 sql-1.localdomain:mysql 172.30.0.110:56547 TIME_WAIT -
tcp 0 0 sql-1.localdomain:2202 172.30.0.8:49152 ESTABLISHED27099/ndbd
tcp 0 0 sql-1.localdomain:mysql 172.30.0.110:56521 TIME_WAIT -
tcp 0 148 sql-1.localdomain:ssh 172.30.0.2:1800 ESTABLISHED18132/0
tcp 0 0 sql-1.localdomain:35465 172.30.0.110:2202 ESTABLISHED27099/ndbd
tcp 0 0 sql-1.localdomain:2202 172.30.0.8:49149 ESTABLISHED27099/ndbd
tcp 0 0 sql-1.localdomain:35468 172.30.0.8:2202 ESTABLISHED27158/mysqld

sql-1:~# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 883M 424M 412M 51% /
tmpfs 126M 0 126M 0% /dev/shm

(The sql data I'm storing will be < 1mb in total, it's just user's ftp login information)

I've checked the logs and nothing seems out of place, there are no errors being thrown.

falko 28th September 2006 21:31

Quote:

Originally Posted by stylez
sql-1:~# netstat -tap
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 *:mysql *:* LISTEN 27158/mysqld
tcp 0 0 *:www *:* LISTEN 813/apache2
tcp 0 0 *:ssh *:* LISTEN 800/sshd
tcp 0 0 sql-1.localdomain:2202 *:* LISTEN 27099/ndbd
tcp 0 0 sql-1.localdomain:35463 172.30.0.110:1186 ESTABLISHED27098/ndbd
tcp 0 0 sql-1.localdomain:35466 172.30.0.110:1186 ESTABLISHED27158/mysqld
tcp 0 0 sql-1.localdomain:mysql 172.30.0.110:56547 TIME_WAIT -
tcp 0 0 sql-1.localdomain:2202 172.30.0.8:49152 ESTABLISHED27099/ndbd
tcp 0 0 sql-1.localdomain:mysql 172.30.0.110:56521 TIME_WAIT -
tcp 0 148 sql-1.localdomain:ssh 172.30.0.2:1800 ESTABLISHED18132/0
tcp 0 0 sql-1.localdomain:35465 172.30.0.110:2202 ESTABLISHED27099/ndbd
tcp 0 0 sql-1.localdomain:2202 172.30.0.8:49149 ESTABLISHED27099/ndbd
tcp 0 0 sql-1.localdomain:35468 172.30.0.8:2202 ESTABLISHED27158/mysqld

sql-1:~# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 883M 424M 412M 51% /
tmpfs 126M 0 126M 0% /dev/shm

(The sql data I'm storing will be < 1mb in total, it's just user's ftp login information)

I've checked the logs and nothing seems out of place, there are no errors being thrown.

What's in /etc/fstab? I could imagine it's a problem with your disk space or memory as a MySQL cluster needs lots of memory...

stylez 29th September 2006 00:44

sql-1:~# cat /etc/fstab
# /etc/fstab: static file system information.
#
# <file system> <mount point> <type> <options> <dump> <pass>
proc /proc proc defaults 0 0
/dev/sda1 / ext3 defaults,errors=remount-ro 0 1
/dev/sda5 none swap sw 0 0
/dev/hda /media/cdrom0 iso9660 ro,user,noauto 0 0
/dev/fd0 /media/floppy0 auto rw,user,noauto 0 0

falko 29th September 2006 14:50

You don't have much swap (only 126MB). And if your memory is low that could cause a problem... What's the output of
Code:

cat /proc/meminfo
?

stylez 29th September 2006 16:20

sql-1:~# cat /proc/meminfo
total: used: free: shared: buffers: cached:
Mem: 263208960 256610304 6598656 0 25546752 80146432
Swap: 82210816 0 82210816
MemTotal: 257040 kB
MemFree: 6444 kB
MemShared: 0 kB
Buffers: 24948 kB
Cached: 78268 kB
SwapCached: 0 kB
Active: 59388 kB
Inactive: 163688 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 257040 kB
LowFree: 6444 kB
SwapTotal: 80284 kB
SwapFree: 80284 kB


So you think I should bump up the memory? I default these VM's to have about 256mb of ram. I didn't think that the cluster would require much since its not hold much information.

stylez 29th September 2006 18:34

So I bumped up the memory on both sql-1 and sql-2 to 512mb of ram.

sql-1:~# cat /proc/meminfo
total: used: free: shared: buffers: cached:
Mem: 528752640 223784960 304967680 0 14512128 72069120
Swap: 82210816 0 82210816
MemTotal: 516360 kB
MemFree: 297820 kB
MemShared: 0 kB
Buffers: 14172 kB
Cached: 70380 kB
SwapCached: 0 kB
Active: 40808 kB
Inactive: 161348 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 516360 kB
LowFree: 297820 kB
SwapTotal: 80284 kB
SwapFree: 80284 kB


Still no change.


All times are GMT +2. The time now is 22:04.

Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2014, vBulletin Solutions, Inc.