The Perfect Load-Balanced & High-Availability Web Cluster With 2 Servers Running Xen On Ubuntu 8.04 Hardy Heron - Page 9
15. Custom scripts for monitoring (lb1, lb2, web1, web2)
I made a few bash script to monitor the whole setup (they are a bit ugly but they work). If you make them better, feel free to mail them to me!
15.1 Monitoring from lb1.example.com
First we must install sendmail so lb1.example.com will be able to send mail :
apt-get install sendmail
The first script will check if the backup load balancer (lb2.example.com) is still available to takeover :
vi /root/lb2_check
#!/bin/bash # Backup load balancer check # Copyright (c) 2008 blogama.org # This script is licensed under GNU GPL version 2.0 or above # --------------------------------------------------------------------- ### This script does 1 verification ### ### 1) Check if backup load balancer failed and send mail notification ### ### To be modified ### EMAIL="[email protected]" ###### Do not make modifications below ###### ### Binaries ### MAIL=$(which mail) ### To restore to original when problem fixed ### if [ $1 ]; then if [ $1=="fix" ]; then rm /root/lb2_problem.txt > /var/log/ha-log exit 1; fi fi ### Check if already notified ### cd /root if [ -f lb2_problem.txt ]; then exit 1; fi ### Check if Heartbeat is running on hot standby ### tail /var/log/ha-log 2>&1 | grep "Asking other side for ping node count" if [ "$?" -ne "1" ]; then echo "Backup load balancer failed" > /root/lb2_problem.txt $MAIL -s "Backup load balancer problem" $EMAIL < /root/lb2_problem.txt fi
We make this script executable :
chmod +x /root/lb2_check
If the lb2.example.com fails, then it will create a file /root/lb2_problem.txt and send a mail notification. Until the file lb2_problem.txt is there, it won't check again. Also we must empty the log file once the problem is fixed for the script to work properly.
Once the problem is fixed on lb2.example.com, please manually run :
/root/lb2_check fix
The next script will check if any ports failed on either web1 or web2 by checking the ldirectord log file. There is already a mail notification with ldirectord but it sends millions of notification, mine only send one until you fix the problem :
vi /root/ports_failed
and make it look like this :
#!/bin/bash # Ldirectord ports failure check # Copyright (c) 2008 blogama.org # This script is licensed under GNU GPL version 2.0 or above # --------------------------------------------------------------------- ### This script does 1 verification ### ### 1) Check for port failure on load balanced servers ### ### To be modified ### EMAIL="[email protected]" ###### Do not make modifications below ###### ### Binaries ### MAIL=$(which mail) #to restore to original when problem fixed if [ $1 ]; then if [ $1=="fix" ]; then rm /root/port_problem.txt > /var/log/ldirectord.log fi fi ###check if already notified### cd /root if [ -f port_problem.txt ]; then cat /var/log/ldirectord.log | grep Deleted > /var/log/port_problem.log exit 1; fi ### Check if port failed ### cat /var/log/ldirectord.log 2>&1 | grep Deleted if [ "$?" -ne "1" ]; then cat /var/log/ldirectord.log | grep Deleted > /var/log/port_problem.log cat "Ports problem see logfile /var/log/port_problem.log" > /root/port_problem.txt $MAIL -s "Some ports failed" $EMAIL < /root/port_problem.txt fi
We make it executable :
chmod +x /root/ports_failed
This is the same as the first script, once the problem is fixed you must run :
/root/ports_failed fix
in order to make the script running again.
Now add both scripts to your crontab :
crontab -e
* * * * * /root/ports_failed >/dev/null 2>&1 * * * * * /root/lb2_check >/dev/null 2>&1
15.2 Monitoring from lb2.example.com
Monitoring the second load balancer is important because it will tell us if the master load balancer failed and if it did, keep an eye for ports failure on web1 and web2.
First we must install sendmail so lb2.example.com will be able to send mail :
apt-get install sendmail
vi /root/ports_check
And paste this script :#!/bin/bash # Ldirectord ports failure check # Copyright (c) 2008 blogama.org # This script is licensed under GNU GPL version 2.0 or above # --------------------------------------------------------------------- ### This script does 2 verifications ### ### 1) check if master load balancer failed and send mail notification ### ### 2) If master load balancer failed, check for port failure on load balanced servers ### ### To be modified ### EMAIL="[email protected]" ###### Do not make modifications below ###### ### Binaries ### MAIL=$(which mail) ### Date ### NOW=$(date) ### To restore to original when problem fixed ### if [ $1 ]; then cd /root/ if [ $1=="fix" ]; then if [ -f lb1_problem.txt ]; then rm /root/lb1_problem.txt fi if [ -f port_problem.txt ]; then rm /root/port_problem.txt fi if [ -f /root/server_problem_notified.txt ]; then rm /root/server_problem_notified.txt fi > /var/log/ldirectord.log > /var/log/ha-log exit 1; fi fi #check if ldirectord is running on lb2.example.com (means that lb1.example.com failed) #$LDIRECTORD /etc/ha.d/ldirectord.cf status 2>&1 | grep running cat /var/log/ha-log | grep "takeover complete" > /dev/null 2>&1 if [ "$?" -ne "1" ]; then ###check if already notified### cd /root if [ -f port_problem.txt ]; then cat /var/log/ldirectord.log | grep Deleted > /var/log/port_problem.log exit 1; fi ### Check if port failed ### cat /var/log/ldirectord.log 2>&1 | grep Deleted if [ "$?" -ne "1" ]; then cat /var/log/ldirectord.log | grep Deleted > /var/log/port_problem.log echo "Ports problem see logfile /var/log/port_problem.log" > /root/port_problem.txt $MAIL -s "Some ports failed" $EMAIL < /root/port_problem.txt fi ### Check if already notified that master load balancer failed ### cd /root if [ -f server_problem_notified.txt ]; then exit 1; fi ### Notify that master load balancer failed ### cd /root MESSAGE="$NOW : Master load balancer failed" echo $MESSAGE > lb1_problem.txt $MAIL -s "Master load balancer failed" $EMAIL < /root/lb1_problem.txt echo "notified" > server_problem_notified.txt fi
We make it executable :
chmod +x /root/ports_check
And we add it to our crontab :
crontab -e
* * * * * /root/ports_failed >/dev/null 2>&1
When you get a notification from the script, please run afterward :
/root/ports_check fix
15.3 Monitoring from web1 & web2
Monitoring of web cluster is already partially done with monit and munin.
The part that is not covered yet is the monitoring of MySQL replication.
Please read the following article :
Repair MySQL master-master replicationMySQL monitoring is optional but on a production server, problems can happend with MySQL replication so I really recommend using those scripts or something similar to check databases consistency.
15.4 Monitoring from remote server
This part is adding extra security by checking important ports (25,53,80,443) from a remote server (install dns-utils for dig):
#!/bin/bash # Script to check important port on remote webserver # Copyright (c) 2008 blogama.org # This script is licensed under GNU GPL version 2.0 or above # --------------------------------------------------------------------- ### This script does a verification on port 25, 53, 80 and 443 ### ### After 2 failed check it will send a mail notification ### ### To be modified ### WEBSERVERIP="192.168.1.106" MAILSERVERIP="192.168.1.106" EMAIL="[email protected]" DNSSERVERIP="192.168.1.106" DOMAINTOCHECKDNS="example.com" DOMAINIP="192.168.1.106" ###### Do not make modifications below ###### ### Binaries ### MAIL=$(which mail) TELNET=$(which telnet) DIG=$(which dig) ### Check if already notified### cd /root if [ -f server_problem.txt ]; then exit 1; fi ### Test SMTP ### ( echo "quit" ) | $TELNET $MAILSERVERIP 25 | grep Connected > /dev/null 2>&1 if [ "$?" -ne "1" ]; then echo "PORT CONNECTED" else if [ -f server_problem_first_time_25.txt ]; then echo "PORT 25 NOT CONNECTED" >> /root/server_problem.txt else echo "NOT CONNECTED" > /root/server_problem_first_time_25.txt fi fi ### Test HTTP ### ( echo "quit" ) | $TELNET $WEBSERVERIP 80 | grep Connected > /dev/null 2>&1 if [ "$?" -ne "1" ]; then echo "PORT CONNECTED" else if [ -f server_problem_first_time_80.txt ]; then echo "PORT 80 NOT CONNECTED" >> /root/server_problem.txt else echo "NOT CONNECTED" > /root/server_problem_first_time_80.txt fi fi ### Test HTTPS### ( echo "quit" ) | $TELNET $WEBSERVERIP 443 | grep Connected > /dev/null 2>&1 if [ "$?" -ne "1" ]; then echo "PORT CONNECTED" else if [ -f server_problem_first_time_443.txt ]; then echo "PORT 81 NOT CONNECTED" >> /root/server_problem.txt else echo "NOT CONNECTED" > /root/server_problem_first_time_443.txt fi fi ### Test DNS ### $DIG $DOMAINTOCHECKDNS @$DNSSERVERIP | grep $DOMAINIP if [ "$?" -ne "1" ]; then echo "PORT CONNECTED" else if [ -f server_problem_first_time_53.txt ]; then echo "PORT 53 NOT CONNECTED" >> /root/server_problem.txt else echo "NOT CONNECTED" > /root/server_problem_first_time_53.txt fi fi ### Send mail notification after 2 failed check ### if [ -f server_problem.txt ]; then $MAIL -s "Server problem" $EMAIL < /root/server_problem.txt fi
Et voila! Feel free to send me private emails at admin [at] marchost.com or post comments here or on my page : blogama.org