How to check MySql replication?

Discussion in 'Installation/Configuration' started by millpark10, Dec 20, 2013.

  1. millpark10

    millpark10 Member

    Hi
    I am following the
    "Installing A Web, Email & MySQL Database Cluster On Debian 6.0 With ISPConfig 3"

    How can I check that the replication between the to mysql-servers is working/up to date.
    If it is not in sync/working, what is the correct procedure to get it inline again.
    On my Server1:
    mysql> SHOW SLAVE STATUS \G
    *************************** 1. row ***************************
    Slave_IO_State: Waiting for master to send event
    Master_Host: 172.17.2.182
    Master_User: slaveuser
    Master_Port: 3306
    Connect_Retry: 60
    Master_Log_File: mysql-bin.000003
    Read_Master_Log_Pos: 267
    Relay_Log_File: mysqld-relay-bin.000014
    Relay_Log_Pos: 370
    Relay_Master_Log_File: mysql-bin.000003
    Slave_IO_Running: Yes
    Slave_SQL_Running: Yes
    Replicate_Do_DB:
    Replicate_Ignore_DB:
    Replicate_Do_Table:
    Replicate_Ignore_Table:
    Replicate_Wild_Do_Table:
    Replicate_Wild_Ignore_Table:
    Last_Errno: 0
    Last_Error:
    Skip_Counter: 0
    Exec_Master_Log_Pos: 267
    Relay_Log_Space: 673
    Until_Condition: None
    Until_Log_File:
    Until_Log_Pos: 0
    Master_SSL_Allowed: No
    Master_SSL_CA_File:
    Master_SSL_CA_Path:
    Master_SSL_Cert:
    Master_SSL_Cipher:
    Master_SSL_Key:
    Seconds_Behind_Master: 0
    Master_SSL_Verify_Server_Cert: No
    Last_IO_Errno: 0
    Last_IO_Error:
    Last_SQL_Errno: 0
    Last_SQL_Error:
    Replicate_Ignore_Server_Ids:
    Master_Server_Id: 2
    1 row in set (0.00 sec)

    On my Server2:
    mysql> SHOW SLAVE STATUS \G
    *************************** 1. row ***************************
    Slave_IO_State: Waiting for master to send event
    Master_Host: 172.17.2.181
    Master_User: slaveuser
    Master_Port: 3306
    Connect_Retry: 60
    Master_Log_File: mysql-bin.000006
    Read_Master_Log_Pos: 11551505
    Relay_Log_File: mysqld-relay-bin.000013
    Relay_Log_Pos: 11551651
    Relay_Master_Log_File: mysql-bin.000006
    Slave_IO_Running: Yes
    Slave_SQL_Running: Yes
    Replicate_Do_DB:
    Replicate_Ignore_DB:
    Replicate_Do_Table:
    Replicate_Ignore_Table:
    Replicate_Wild_Do_Table:
    Replicate_Wild_Ignore_Table:
    Last_Errno: 0
    Last_Error:
    Skip_Counter: 0
    Exec_Master_Log_Pos: 11551505
    Relay_Log_Space: 11551851
    Until_Condition: None
    Until_Log_File:
    Until_Log_Pos: 0
    Master_SSL_Allowed: No
    Master_SSL_CA_File:
    Master_SSL_CA_Path:
    Master_SSL_Cert:
    Master_SSL_Cipher:
    Master_SSL_Key:
    Seconds_Behind_Master: 0
    Master_SSL_Verify_Server_Cert: No
    Last_IO_Errno: 0
    Last_IO_Error:
    Last_SQL_Errno: 0
    Last_SQL_Error:
    Replicate_Ignore_Server_Ids:
    Master_Server_Id: 1
    1 row in set (0.00 sec)

    Pls someone show me a way or a link to find
    1. Is above correct or faulty and if so in what way, (some values above differ a lot)?
    2. How to make it right (Shall all values be the same in above lists?)
    3. Is ISPconfig monitoring and reporting the mirror set in any way?

    //millpark10
     
    Last edited: Dec 20, 2013
  2. florian030

    florian030 ISPConfig Developer ISPConfig Developer

    I use nagios to monitor a mysql-replication.

    There is no general solution to fix a problem with the replication. You can skip the error or rebuild the replication. See here.

    You replication is still running. See "Slave_IO_Running: Yes"

    The values for the slave-status must not be equal on each server.

    Currently not. But it´s a nice idea. ;) But i will not add this to 3.0.5.x as 3.0.6 will have a much better handling for system-cron-jobs.

    BTW. I´m not sure if there is any statement in the tutorial regarding skipping some errors (1007 and 1008) if you´re running ispconfig in a cluster-setup. in some cases this could crash your replication.
     
  3. millpark10

    millpark10 Member

    Florian,
    Thank you for a very quick response!
    So I will read me up on nagios, heard about it before but have not had the need.

    Is "Slave_IO_Running: Yes" the only parameter I have to take into consideration?
    e.g. if it is "yes" then everything is working and I dont have to think about any other
    figures in the SHOW SLAVE STATUS \G output? Even if they seem to be way different as
    in "Exec_Master_Log_Pos" for example in my post?

    Quote:
    "3. Is ISPconfig monitoring and reporting the mirror set in any way?"
    "Currently not. But it´s a nice idea. But i will not add this to 3.0.5.x as 3.0.6 will have a much better handling for system-cron-jobs."

    I read about skipping errors, didn't understand it but now have at least a idea how to avoid a situation (1007) before it is happening.

    OK, good, well this was not a request for features :) just a way to understand how it works.
    I am in the process of building a ISPconfig3-system that is supposed to be as safe/secure/redundant as possible.
    I am using ftm, 2 HP proliant DL380 G4 with 14 disk smartarrays connected. Both servers with hw RAID5 and the Smartarray also RAID5. These two servers are identical and is in a mirror(cluster)setup.

    Next question is, what will I have to do if one of the servers was down for "maintenance" (e.g. change of a faulty disk) in order to get it "inline" again after it is online.

    //millpark10
     
  4. florian030

    florian030 ISPConfig Developer ISPConfig Developer

    No, you should also checK slave_io_running and slave_sql_running. Maybe Seconds_Behind_Master, too.

    You can give this script a try.

    In this setup i would skip sql-errors 1007 and 1008.


    If you just shut down one (mysql)server, the replication restarts automaticly after the (mysql)server is back online and connected to the other (mysql)server. But this won´t work when you remove the mysql-databases. In this case you have to rebuild the replication (e.g. dump data from the other server and import it).
     
    Last edited: Dec 21, 2013
  5. millpark10

    millpark10 Member

    Florian,
    again, thank you for quick response and good explanation to my newbie questions.
    I read your links and suggestions and is getting a crasch course about all this!
    :)
    "No, you should also checK slave_io_running and slave_sql_running. Maybe Seconds_Behind_Master, too. You can give this script a try."
    I will definitely look into the script (saw it somewere before when searching howtoforge) and find a way to implement it in my setup.

    Also your comment "In this setup i would skip sql-errors 1007 and 1008." I understood from your blogpost it WILL make the replication to crash in not so uncommon circumstances.

    So as long as I "only stop" one of my two servers, the replication will repair itself and catch up on any changes that have been done during downtime. Sounds great!
    But, this is only ISPconfig related stuff?
    Not actual data? I mean /var/www and /var/mail will not have redundance in this basic setup?
    I have to e.g. rsync or something similiar in order to have that data mirrored and redundant?
    Again, sorry for asking questions that I might have read but want to be sure I understood.
    My goal is to make my installation as failsafe as ever possible.

    (Dont know if I shall start a new thread but I have to mount my smartarray and probably do some entries in fstab in order to move /var/www /var/mail and mysql data to that disk, or is there somewhere a "howto" for that as well? Not completely sure if I have all info I need to do this)

    //millpark10
     
  6. florian030

    florian030 ISPConfig Developer ISPConfig Developer

    A mysql-replication only works for mysql.

    To keep the data in sync (web and mail), you can use unison or drbd. If you would to offer mail on both servers at the same time, it would use drbd, to have the data allways in sync.
     
  7. millpark10

    millpark10 Member

    Florian,
    again, thank you for quick response!

    According to the guide I've been following,
    "Installing A Web, Email & MySQL Database Cluster On Debian 6.0 With ISPConfig 3"
    I have Unison installed already. (how to change that)
    After reading about DRBD I understand it adds a bit to redundance, although I don't fully get everything it will do for me. But failover is one important function.
    Is the "http://www.howtoforge.com/setting-up-network-raid1-with-drbd-on-ubuntu-11.10" up-to-date enough for me to follow? It seems as it will format and mount the two "spare disks" I have (one on each server) and make them a network mirror-set. That is good!
    Is there a description somewhere (searched but didnt find) on howto move /var to this new partition.
    Or is it best(?) to only move /var/www and /var/mail and not whole /var ?

    I am searching for "best practices" here in order to build as much as possible according to "standard".
    (I guess in the future it will be easier to get help/support with as few specials as possible. And it probably will scale better when needed.)

    //millpark10
     
  8. florian030

    florian030 ISPConfig Developer ISPConfig Developer

    Setting up a DRBD-replication it´s quiet simple. You have a volume on each server which is connected to the other server. You can find a lot of how-to´s and on drbd.org you find also a setup for using primary-primary with ocfs.

    NEVER sync the whole /var.

    If would create to volumes r0 and r1. Then put mail on r0 and web on r1 and mount the volumes to /var/vmail and /var/www.
     
  9. millpark10

    millpark10 Member

    Hi Florian
    - Got help to develop an improved check-master-master-replication script from Script as starting point.
    Not sure how to post it here, I will happily share this script with the forum. Is it ok to just attach it to a post (as .txt)?
    - You said:
    Dont understand how this setup could offer mail on the two servers at the same time. It seems to me that the mirrored(clustered) setup is (without heartbeat/keepalived) more secure but not redundant to each other? (e.g. RAID1)
    - Read a lot about drbd on the site but could not be sure how to configure it in my setup, As mentioned in other places, there is a possibility, with unison for /var/vmail, to loose mail. This is what I'd like to avoid.
    - Last question, I read your post but I can not get the error reproduced as described in my setup. If I add SLAVE_SKIP_ERRORS=1007,1008 in my.cnf it will not do anything bad, will it?

    Thank you for being so kind to help us newbies here in the forum.
    (Will subscribe to support!)
    //millpark10
     
  10. florian030

    florian030 ISPConfig Developer ISPConfig Developer

    If you use drbd in dual-primary mode you have the same date on both servers.

    If you _really_ can´t re-produce this error, you can ignore the SLAVE_SKIP_ERRORS in your config. I just noted that you must stop the slave on both servers (at least with mysqld 5.5.35) if you exclude the db mysql from the replication.
     
    Last edited: Feb 1, 2014
  11. millpark10

    millpark10 Member

    Hi Florian

    Good catch to stop both servers! (noticed you updated your page about it) :)
    Of course I get the error then.
    Added:
    slave_skip_errors=1007,1008
    in /etc/mysql/my.cnf
    and problem "solved" (worked around).
    :)
    About DRBD it will be implemented on my next servers instead. Have to read more.
    For now I will use unison and, as I understand, it will copy from first server, at a preset time interval e.g. every 5 min, to the second server. Not optimal at all but I really have to put this cluster online!
    Next server cluster i build will be HA, redundant and have somekind of heartbeat/failover. I'll be happy to get inputs on how to do this. Howtos I found so far seems a bit outdated.
    :(
    //millpark10
     
  12. florian030

    florian030 ISPConfig Developer ISPConfig Developer

    If you replicate the mysql-db you shoul hit the error with stopping only one master-task.
     
  13. millpark10

    millpark10 Member

    Hi Florian

    I'm not 100% sure how it went down, but I couldn't duplicate the error exactly as described earlier in your blog page. But the error appeard when deleting a database. And with both slaves stopped the error showed up exactly as the updated blog page described.

    Posting the script and regarding your comment about the value "Seconds_Behind_Master: 0" I might update the script once the server is in production if it will report "value not found a lot". I guess this will depend heavily on how much load the server will have regarding replication. Probably I will search for a value outside a predefined "normal" variation instead, and then let the script post a mail.
    Thanks for all god advice.

    So, Below is the script that check if replication is up, If not, it send you a mail. Happy to hear comments about it, or other improvements.
    (I am running the script on both servers in the cluster, but changed "From Server1 Replication new problem" on server 2 to "From Server2 Replication new problem")
    :)
    //millpark10

    Code:
    #!/bin/bash
    PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
    
    #file name of errorfile
    filename="error_"$(date +"%Y%m%d%H%M%S")".txt"
    
    #Directory where error file will be written
    cd /root/scripts
    
    #Get status of replication from MySQL once
    output=$((echo "show slave status \G;") | mysql -u root -pYOURMYSQLPASSWORD 2>&1)
    
    
    #three searches for specific strings in $output, if not found this will be written to the error file
    echo "If you see three outputs below and one empty \"Textfile output\" then everything is working" #remove after test run
    
    ###Check Slave_IO_Running###
    echo "$output" | grep "Slave_IO_Running: Yes"
    if [ "$?" -eq "1" ]; then
    echo "Did not catch: \"Slave_IO_Running: Yes\"" >> $filename
    fi
    
    ####Check Slave_SQL_Running###
    echo "$output" | grep "Slave_SQL_Running: Yes"
    if [ "$?" -eq "1" ]; then
    echo "Did not catch: \"Slave_SQL_Running: Yes\"" >> $filename
    fi
    
    ####Check Seconds_Behind_Master###
    echo "$output" | grep "Seconds_Behind_Master: 0"
    if [ "$?" -eq "1" ]; then
    echo "Did not catch: \"Seconds_Behind_Master: 0\"" >> $filename
    fi
    
    echo "Textfile output:" #remove after test run
    
    ###Write to logfile everytime script is run, file might be manually deleted once a year or so. Grows with ~125k/month
    date >> /var/log/mysql_chk_replication.log
    
    ###Send notification if replication down###
    if [ -f $filename ]; then
    date >> $filename
    cat $filename #remove after test and uncomment row below to start sending mail when any of the three checks fails
    #mail -s "From Server1 Replication new problem" [email protected] < $filename
    fi
    
    #### Make script run every 10 min by putting uncommented line below in 
    #### crontab -e
    #### */10 * * * * /root/scripts/chk_sql_ok_howtoforgepost.sh &> /dev/null
    #### I placed my script in /root/scripts and renamed it to chk_sql_ok_howtoforgepost.sh chmod set to rwxr-x--x
    
     
  14. millpark10

    millpark10 Member

    Minor update to script

    Here is the updated version of the script, posting more info if Seconds_Behind_Master is not zero.
    Code:
    #!/bin/bash
    PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
    
    #file name of errorfile
    filename="error_"$(date +"%Y%m%d%H%M%S")".txt"
    
    #Directory where error file will be written
    cd /root/scripts
    
    #Get status of replication from MySQL once
    output=$((echo "show slave status \G;") | mysql -u root -pYOURMYSQLPASSWORD 2>&1)
    
    
    #three searches for specific strings in $output, if not found this will be written to the error file
    #echo "If you see three outputs below and one empty \"Textfile output\" then everything is working" #remove after test run
    
    ###Check Slave_IO_Running###
    echo "$output" | grep "Slave_IO_Running: Yes"
    if [ "$?" -eq "1" ]; then
    echo "Did not catch: \"Slave_IO_Running: Yes\"" >> $filename
    fi
    
    ####Check Slave_SQL_Running###
    echo "$output" | grep "Slave_SQL_Running: Yes"
    if [ "$?" -eq "1" ]; then
    echo "Did not catch: \"Slave_SQL_Running: Yes\"" >> $filename
    fi
    
    ####Check Seconds_Behind_Master###
    #echo "$output" | grep "Seconds_Behind_Master: 0"
    #if [ "$?" -eq "1" ]; then
    #echo "Did not catch: \"Seconds_Behind_Master: 0\"" >> $filename
    #fi
    latency=$(echo "$output" | grep "Seconds_Behind_Master")
    echo "$latency" | grep "Seconds_Behind_Master: 0"
    if [ "$?" -eq "1" ]; then
    echo "Caught delay: $latency" >> $filename
    fi
    
    #echo "Textfile output:" #remove after test run
    
    ###Write to logfile everytime script is run
    date >> /var/log/mysql_chk_replication.log
    
    ###Send notification if replication down or Seconds_Behind_Master is not 0###
    if [ -f $filename ]; then
    date >> $filename
    echo "$output" >> $filename
    #cat $filename #remove after test and uncomment row below to start sending mail when any of the three checks fails
    mail -s "From Server1 Replication new problem" [email protected] < $filename
    fi
    
    #### Make script run every 10 min by putting uncommented line below in 
    #### crontab -e
    #### */10 * * * * /root/scripts/chk_sql_ok_howtoforgepost.sh &> /dev/null
    #### I placed my script in /root/scripts and renamed it to chk_sql_ok_howtoforgepost.sh chmod set to rwxr-x--x
    Do apropriate changes and put same script on second server as well.
    //millpark10
     

Share This Page