15. Custom scripts for monitoring (lb1, lb2, web1, web2)
I made a few bash script to monitor the whole setup (they are a bit ugly but they work). If you make them better, feel free to mail them to me!
15.1 Monitoring from lb1.example.com
First we must install sendmail so lb1.example.com will be able to send mail :
apt-get install sendmail
The first script will check if the backup load balancer (lb2.example.com) is still available to takeover :
vi /root/lb2_check
#!/bin/bash
# Backup load balancer check
# Copyright (c) 2008 blogama.org
# This script is licensed under GNU GPL version 2.0 or above
# ---------------------------------------------------------------------
### This script does 1 verification ###
### 1) Check if backup load balancer failed and send mail notification ###
### To be modified ###
EMAIL="admin@example.com"
###### Do not make modifications below ######
### Binaries ###
MAIL=$(which mail)
### To restore to original when problem fixed ###
if [ $1 ]; then
if [ $1=="fix" ]; then
rm /root/lb2_problem.txt
> /var/log/ha-log
exit 1;
fi
fi
### Check if already notified ###
cd /root
if [ -f lb2_problem.txt ]; then
exit 1;
fi
### Check if Heartbeat is running on hot standby ###
tail /var/log/ha-log 2>&1 | grep "Asking other side for ping node count"
if [ "$?" -ne "1" ]; then
echo "Backup load balancer failed" > /root/lb2_problem.txt
$MAIL -s "Backup load balancer problem" $EMAIL < /root/lb2_problem.txt
fi
We make this script executable :
chmod +x /root/lb2_check
If the lb2.example.com fails, then it will create a file /root/lb2_problem.txt and send a mail notification. Until the file lb2_problem.txt is there, it won't check again. Also we must empty the log file once the problem is fixed for the script to work properly.
Once the problem is fixed on lb2.example.com, please manually run :
/root/lb2_check fix
The next script will check if any ports failed on either web1 or web2 by checking the ldirectord log file. There is already a mail notification with ldirectord but it sends millions of notification, mine only send one until you fix the problem :
vi /root/ports_failed
and make it look like this :
#!/bin/bash
# Ldirectord ports failure check
# Copyright (c) 2008 blogama.org
# This script is licensed under GNU GPL version 2.0 or above
# ---------------------------------------------------------------------
### This script does 1 verification ###
### 1) Check for port failure on load balanced servers ###
### To be modified ###
EMAIL="admin@example.com"
###### Do not make modifications below ######
### Binaries ###
MAIL=$(which mail)
#to restore to original when problem fixed
if [ $1 ]; then
if [ $1=="fix" ]; then
rm /root/port_problem.txt
> /var/log/ldirectord.log
fi
fi
###check if already notified###
cd /root
if [ -f port_problem.txt ]; then
cat /var/log/ldirectord.log | grep Deleted > /var/log/port_problem.log
exit 1;
fi
### Check if port failed ###
cat /var/log/ldirectord.log 2>&1 | grep Deleted
if [ "$?" -ne "1" ]; then
cat /var/log/ldirectord.log | grep Deleted > /var/log/port_problem.log
cat "Ports problem see logfile /var/log/port_problem.log" > /root/port_problem.txt
$MAIL -s "Some ports failed" $EMAIL < /root/port_problem.txt
fi
We make it executable :
chmod +x /root/ports_failed
This is the same as the first script, once the problem is fixed you must run :
/root/ports_failed fix
in order to make the script running again.
Now add both scripts to your crontab :
crontab -e
* * * * * /root/ports_failed >/dev/null 2>&1 * * * * * /root/lb2_check >/dev/null 2>&1
15.2 Monitoring from lb2.example.com
Monitoring the second load balancer is important because it will tell us if the master load balancer failed and if it did, keep an eye for ports failure on web1 and web2.
First we must install sendmail so lb2.example.com will be able to send mail :
apt-get install sendmail
vi /root/ports_check
And paste this script :#!/bin/bash
# Ldirectord ports failure check
# Copyright (c) 2008 blogama.org
# This script is licensed under GNU GPL version 2.0 or above
# ---------------------------------------------------------------------
### This script does 2 verifications ###
### 1) check if master load balancer failed and send mail notification ###
### 2) If master load balancer failed, check for port failure on load balanced servers ###
### To be modified ###
EMAIL="admin@example.com"
###### Do not make modifications below ######
### Binaries ###
MAIL=$(which mail)
### Date ###
NOW=$(date)
### To restore to original when problem fixed ###
if [ $1 ]; then
cd /root/
if [ $1=="fix" ]; then
if [ -f lb1_problem.txt ]; then
rm /root/lb1_problem.txt
fi
if [ -f port_problem.txt ]; then
rm /root/port_problem.txt
fi
if [ -f /root/server_problem_notified.txt ]; then
rm /root/server_problem_notified.txt
fi
> /var/log/ldirectord.log
> /var/log/ha-log
exit 1;
fi
fi
#check if ldirectord is running on lb2.example.com (means that lb1.example.com failed)
#$LDIRECTORD /etc/ha.d/ldirectord.cf status 2>&1 | grep running
cat /var/log/ha-log | grep "takeover complete" > /dev/null 2>&1
if [ "$?" -ne "1" ]; then
###check if already notified###
cd /root
if [ -f port_problem.txt ]; then
cat /var/log/ldirectord.log | grep Deleted > /var/log/port_problem.log
exit 1;
fi
### Check if port failed ###
cat /var/log/ldirectord.log 2>&1 | grep Deleted
if [ "$?" -ne "1" ]; then
cat /var/log/ldirectord.log | grep Deleted > /var/log/port_problem.log
echo "Ports problem see logfile /var/log/port_problem.log" > /root/port_problem.txt
$MAIL -s "Some ports failed" $EMAIL < /root/port_problem.txt
fi
### Check if already notified that master load balancer failed ###
cd /root
if [ -f server_problem_notified.txt ]; then
exit 1;
fi
### Notify that master load balancer failed ###
cd /root
MESSAGE="$NOW : Master load balancer failed"
echo $MESSAGE > lb1_problem.txt
$MAIL -s "Master load balancer failed" $EMAIL < /root/lb1_problem.txt
echo "notified" > server_problem_notified.txt
fi
We make it executable :
chmod +x /root/ports_check
And we add it to our crontab :
crontab -e
* * * * * /root/ports_failed >/dev/null 2>&1
When you get a notification from the script, please run afterward :
/root/ports_check fix
15.3 Monitoring from web1 & web2
Monitoring of web cluster is already partially done with monit and munin.
The part that is not covered yet is the monitoring of MySQL replication.
Please read the following article :
Repair MySQL master-master replicationMySQL monitoring is optional but on a production server, problems can happend with MySQL replication so I really recommend using those scripts or something similar to check databases consistency.
15.4 Monitoring from remote server
This part is adding extra security by checking important ports (25,53,80,443) from a remote server (install dns-utils for dig):
#!/bin/bash
# Script to check important port on remote webserver
# Copyright (c) 2008 blogama.org
# This script is licensed under GNU GPL version 2.0 or above
# ---------------------------------------------------------------------
### This script does a verification on port 25, 53, 80 and 443 ###
### After 2 failed check it will send a mail notification ###
### To be modified ###
WEBSERVERIP="192.168.1.106"
MAILSERVERIP="192.168.1.106"
EMAIL="admin@example.com"
DNSSERVERIP="192.168.1.106"
DOMAINTOCHECKDNS="example.com"
DOMAINIP="192.168.1.106"
###### Do not make modifications below ######
### Binaries ###
MAIL=$(which mail)
TELNET=$(which telnet)
DIG=$(which dig)
### Check if already notified###
cd /root
if [ -f server_problem.txt ]; then
exit 1;
fi
### Test SMTP ###
(
echo "quit"
) | $TELNET $MAILSERVERIP 25 | grep Connected > /dev/null 2>&1
if [ "$?" -ne "1" ]; then
echo "PORT CONNECTED"
else
if [ -f server_problem_first_time_25.txt ]; then
echo "PORT 25 NOT CONNECTED" >> /root/server_problem.txt
else
echo "NOT CONNECTED" > /root/server_problem_first_time_25.txt
fi
fi
### Test HTTP ###
(
echo "quit"
) | $TELNET $WEBSERVERIP 80 | grep Connected > /dev/null 2>&1
if [ "$?" -ne "1" ]; then
echo "PORT CONNECTED"
else
if [ -f server_problem_first_time_80.txt ]; then
echo "PORT 80 NOT CONNECTED" >> /root/server_problem.txt
else
echo "NOT CONNECTED" > /root/server_problem_first_time_80.txt
fi
fi
### Test HTTPS###
(
echo "quit"
) | $TELNET $WEBSERVERIP 443 | grep Connected > /dev/null 2>&1
if [ "$?" -ne "1" ]; then
echo "PORT CONNECTED"
else
if [ -f server_problem_first_time_443.txt ]; then
echo "PORT 81 NOT CONNECTED" >> /root/server_problem.txt
else
echo "NOT CONNECTED" > /root/server_problem_first_time_443.txt
fi
fi
### Test DNS ###
$DIG $DOMAINTOCHECKDNS @$DNSSERVERIP | grep $DOMAINIP
if [ "$?" -ne "1" ]; then
echo "PORT CONNECTED"
else
if [ -f server_problem_first_time_53.txt ]; then
echo "PORT 53 NOT CONNECTED" >> /root/server_problem.txt
else
echo "NOT CONNECTED" > /root/server_problem_first_time_53.txt
fi
fi
### Send mail notification after 2 failed check ###
if [ -f server_problem.txt ]; then
$MAIL -s "Server problem" $EMAIL < /root/server_problem.txt
fi
Et voila! Feel free to send me private emails at admin [at] marchost.com or post comments here or on my page : blogama.org