How To Repair MySQL Replication

Want to support HowtoForge? Become a subscriber!
 
Submitted by falko (Contact Author) (Forums) on Fri, 2008-06-06 12:10. :: MySQL

How To Repair MySQL Replication

Version 1.0
Author: Falko Timme <ft [at] falkotimme [dot] com>
Last edited 05/29/2008

If you have set up MySQL replication, you probably know this problem: sometimes there are invalid MySQL queries which cause the replication to not work anymore. In this short guide I explain how you can repair the replication on the MySQL slave without the need to set it up from scratch again.

I do not issue any guarantee that this will work for you!

 

1 Identifying The Problem

To find out whether replication is/is not working and what has caused to stop it, you can take a look at the logs. On Debian, for example, MySQL logs to /var/log/syslog:

grep mysql /var/log/syslog

server1:/home/admin# grep mysql /var/log/syslog
May 29 09:56:08 http2 mysqld[1380]: 080529 9:56:08 [ERROR] Slave: Error 'Table 'mydb.taggregate_temp_1212047760' doesn't exist' on query. Default database: 'mydb'. Query: 'UPDATE thread AS thread,taggregate_temp_1212047760 AS aggregate
May 29 09:56:08 http2 mysqld[1380]: ^ISET thread.views = thread.views + aggregate.views
May 29 09:56:08 http2 mysqld[1380]: ^IWHERE thread.threadid = aggregate.threadid', Error_code: 1146
May 29 09:56:08 http2 mysqld[1380]: 080529 9:56:08 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.001079' position 203015142
server1:/home/admin#

You can see what query caused the error, and at what log position the replication stopped.

To verify that the replication is really not working, log in to MySQL:

mysql -u root -p

On the MySQL shell, run:

mysql> SHOW SLAVE STATUS \G

If one of Slave_IO_Running or Slave_SQL_Running is set to No, then the replication is broken:

mysql> SHOW SLAVE STATUS \G
*************************** 1. row ***************************
             Slave_IO_State: Waiting for master to send event
                Master_Host: 1.2.3.4
                Master_User: slave_user
                Master_Port: 3306
              Connect_Retry: 60
            Master_Log_File: mysql-bin.001079
        Read_Master_Log_Pos: 269214454
             Relay_Log_File: slave-relay.000130
              Relay_Log_Pos: 100125935
      Relay_Master_Log_File: mysql-bin.001079
           Slave_IO_Running: Yes
          Slave_SQL_Running: No
            Replicate_Do_DB: mydb
        Replicate_Ignore_DB:
         Replicate_Do_Table:
     Replicate_Ignore_Table:
    Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
                 Last_Errno: 1146
                 Last_Error: Error 'Table 'mydb.taggregate_temp_1212047760' doesn't exist' on query. Default database: 'mydb'. 
Query: 'UPDATE thread AS thread,taggregate_temp_1212047760 AS aggregate
        SET thread.views = thread.views + aggregate.views
        WHERE thread.threadid = aggregate.threadid'
               Skip_Counter: 0
        Exec_Master_Log_Pos: 203015142
            Relay_Log_Space: 166325247
            Until_Condition: None
             Until_Log_File:
              Until_Log_Pos: 0
         Master_SSL_Allowed: No
         Master_SSL_CA_File:
         Master_SSL_CA_Path:
            Master_SSL_Cert:
          Master_SSL_Cipher:
             Master_SSL_Key:
      Seconds_Behind_Master: NULL
1 row in set (0.00 sec)

mysql>

 

2 Repairing The Replication

Just to go sure, we stop the slave:

mysql> STOP SLAVE;

Fixing the problem is actually quite easy. We tell the slave to simply skip the invalid SQL query:

mysql> SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1;

This tells the slave to skip one query (which is the invalid one that caused the replication to stop). If you'd like to skip two queries, you'd use SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 2; instead and so on.

That's it already. Now we can start the slave again...

mysql> START SLAVE;

... and check if replication is working again:

mysql> SHOW SLAVE STATUS \G

mysql> SHOW SLAVE STATUS \G
*************************** 1. row ***************************
             Slave_IO_State: Waiting for master to send event
                Master_Host: 1.2.3.4
                Master_User: slave_user
                Master_Port: 3306
              Connect_Retry: 60
            Master_Log_File: mysql-bin.001079
        Read_Master_Log_Pos: 447560366
             Relay_Log_File: slave-relay.000130
              Relay_Log_Pos: 225644062
      Relay_Master_Log_File: mysql-bin.001079
           Slave_IO_Running: Yes
          Slave_SQL_Running: Yes
            Replicate_Do_DB: mydb
        Replicate_Ignore_DB:
         Replicate_Do_Table:
     Replicate_Ignore_Table:
    Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
                 Last_Errno: 0
                 Last_Error:
               Skip_Counter: 0
        Exec_Master_Log_Pos: 447560366
            Relay_Log_Space: 225644062
            Until_Condition: None
             Until_Log_File:
              Until_Log_Pos: 0
         Master_SSL_Allowed: No
         Master_SSL_CA_File:
         Master_SSL_CA_Path:
            Master_SSL_Cert:
          Master_SSL_Cipher:
             Master_SSL_Key:
      Seconds_Behind_Master: 0
1 row in set (0.00 sec)

mysql>

As you see, both Slave_IO_Running and Slave_SQL_Running are set to Yes now.

Now leave the MySQL shell...

mysql> quit;

... and check the log again:

grep mysql /var/log/syslog

server1:/home/admin# grep mysql /var/log/syslog
May 29 09:56:08 http2 mysqld[1380]: 080529 9:56:08 [ERROR] Slave: Error 'Table 'mydb.taggregate_temp_1212047760' doesn't exist' on query. Default database: 'mydb'. Query: 'UPDATE thread AS thread,taggregate_temp_1212047760 AS aggregate
May 29 09:56:08 http2 mysqld[1380]: ^ISET thread.views = thread.views + aggregate.views
May 29 09:56:08 http2 mysqld[1380]: ^IWHERE thread.threadid = aggregate.threadid', Error_code: 1146
May 29 09:56:08 http2 mysqld[1380]: 080529 9:56:08 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.001079' position 203015142
May 29 11:42:13 http2 mysqld[1380]: 080529 11:42:13 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.001079' at position 203015142, relay log '/var/lib/mysql/slave-relay.000130' position: 100125935
server1:/home/admin#

The last line says that replication has started again, and if you see no errors after that line, everything is ok.

 

3 Links


Please do not use the comment function to ask for help! If you need help, please use our forum.
Comments will be published after administrator approval.
Submitted by Ramanath (not registered) on Mon, 2011-03-14 11:13.

What i have to do if Slave_IO_Running: No and Seconds_Behind_Master: NULL. Please help me.I am waiting for your response.

http://www.howtoforge.com/how-to-repair-mysql-replication.. In this link you have mentioned the solution for Slave_SQL_Running: No.Please suggest me for Slave_IO_Running: No and Seconds_Behind_Master: NULL
Submitted by Farshid (not registered) on Sat, 2010-10-23 23:13.

Thank you. I got replication working without having to rebuild in middle of night remotely!

Submitted by sureshkumar (not registered) on Fri, 2009-07-03 10:46.

Thanks.......... Good work ...

 

 

Submitted by Richard (not registered) on Thu, 2009-01-15 16:51.
Bear in mind that any time you have a query which *did* successfully execute on the master and is skipped on the slave and you use a SQL_SLAVE_SKIP_COUNTER method to "fix" the problem, your master and slave are now no longer in sync. Yes, this is sometimes necessary, but if it is a recurring issue, then the problems go much deeper than merely broken replication.
Submitted by Anonymous (not registered) on Tue, 2009-11-24 07:42.

noted that this is not the proper "fix" for the problem, but we are missing your proposed solution

Submitted by Anonymous (not registered) on Thu, 2009-01-29 21:20.
True, but, sometimes the slave will try to execute things that don't apply like a "drop trigger" statement for a trigger that doesn't ever exist because the slave is only replicating specific tables.
Submitted by Anonymous (not registered) on Sun, 2009-01-18 01:43.
Indeed. And if you skip a query to 'fix' the replication,  you run the very serious risk that the replication will become even more out of sync further down the line. This isn't fixing, it's just brushing the problem under the carpet and hoping it goes away. If you must skip a query, look at the query first, and be sure its absence won't cause future queries to fail
Submitted by Perry Whelan (not registered) on Wed, 2008-10-22 17:32.

I'm managing an infrastructure with a number of databases who (for codified reasons that I cannot influence) suffer from this situation often. So, I've written a cron script to manage the situation.

 Does anyone see any foreseeable issues with this logic (see below)?

#!/bin/bash
## Tool to unstick MySQL Replicators.
## Set to run from cron once a minute.

# */1 * * * * /usr/local/bin/whipSlave.mysql.sh > /dev/null 2>&1

# Last updated: MM/DD/YYYY

COMMANDS="mysql grep awk logger"

export PATH='/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin'

for i in $COMMANDS
do
        X=`echo $i | tr '[a-z]' '[A-Z]'`
        export $X=`type -p $i`
done

# Define variables
USERNAME=dbuser
PASSWORD=password

# Define Functions
## Obtain MwSQL slave server status
function SLAVE()
{
        STATUS=`$MYSQL -u $USERNAME -p$PASSWORD -e \
                "SHOW SLAVE STATUS \G" |
                $GREP Seconds_Behind_Master |
                $AWK '{print $2}'`
}

## Skip errors
function UNSTICK()
{
        $MYSQL -u $USERNAME -p$PASSWORD -e \
                "STOP SLAVE; SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1; START SLAVE;"
        sleep 5
        # Check everything again
        CHECK
}

## Decide what to do...
function CHECK()
{
        # Obtain status
        SLAVE
        if [ $STATUS = NULL ]
        then
                # I think the replicator is broken
                echo "MySQL Slave database is not replicating. Fixing..." | $LOGGER
                UNSTICK
        else
                # Everything should be fine
                echo "MySQL Slave is $STATUS seconds behind its Master." | $LOGGER
        fi
}

## Are we running?
function ALIVE()
{
        UP=`$MYSQL -u $USERNAME -p$PASSWORD -e \
                "SHOW SLAVE STATUS \G" |
                $GREP Slave_IO_Running |
                $AWK '{print $2}'`

        if [ $UP = Yes ]
        then
                # Let's check if everything is good, then...
                CHECK
        else
                # Uh oh...let's not do anything.
                echo "MySQL Slave IO is not running!" | $LOGGER
                exit 1
        fi
}

# How is everything?
ALIVE

#EoF
exit 0

Submitted by Jeff Buchbinder (not registered) on Thu, 2009-01-15 17:33.

This might be a little bit simpler (works with MySQL 5.0+, I believe):

#!/bin/bash done=0 while [ $done -eq 0 ]; do # get status done=$( mysql -Be 'show slave status;' | tail -1 | cut -f12 | grep Yes | wc -l ) if [ $done -eq 0 ]; then echo "Advancing position past [$(mysql -Be 'show slave status;' | tail -1 | cut -f20)]... " mysql -uroot -Be "SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1; start slave;" sleep 1 fi done
Submitted by Anonymous (not registered) on Sat, 2011-10-01 14:42.
while [ ! "`mysql -uroot -Be 'SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1; start slave;'`" ]
do
echo "Skipped one Error"
sleep 1
done ; echo "All set"