Comments on How to Repair MySQL Replication

If you have set up MySQL replication, you probably know this problem: sometimes there are invalid MySQL queries which cause the replication to not work anymore. In this short guide, I explain how you can repair the replication on the MySQL slave without the need to set it up from scratch again.

27 Comment(s)

Add comment

Please register in our forum first to comment.

Comments

By: Perry Whelan

I'm managing an infrastructure with a number of databases who (for codified reasons that I cannot influence) suffer from this situation often. So, I've written a cron script to manage the situation.

 Does anyone see any foreseeable issues with this logic (see below)?

#!/bin/bash
## Tool to unstick MySQL Replicators.
## Set to run from cron once a minute.

# */1 * * * * /usr/local/bin/whipSlave.mysql.sh > /dev/null 2>&1

# Last updated: MM/DD/YYYY

COMMANDS="mysql grep awk logger"

export PATH='/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin'

for i in $COMMANDS
do
        X=`echo $i | tr '[a-z]' '[A-Z]'`
        export $X=`type -p $i`
done

# Define variables
USERNAME=dbuser
PASSWORD=password

# Define Functions
## Obtain MwSQL slave server status
function SLAVE()
{
        STATUS=`$MYSQL -u $USERNAME -p$PASSWORD -e \
                "SHOW SLAVE STATUS \G" |
                $GREP Seconds_Behind_Master |
                $AWK '{print $2}'`
}

## Skip errors
function UNSTICK()
{
        $MYSQL -u $USERNAME -p$PASSWORD -e \
                "STOP SLAVE; SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1; START SLAVE;"
        sleep 5
        # Check everything again
        CHECK
}

## Decide what to do...
function CHECK()
{
        # Obtain status
        SLAVE
        if [ $STATUS = NULL ]
        then
                # I think the replicator is broken
                echo "MySQL Slave database is not replicating. Fixing..." | $LOGGER
                UNSTICK
        else
                # Everything should be fine
                echo "MySQL Slave is $STATUS seconds behind its Master." | $LOGGER
        fi
}

## Are we running?
function ALIVE()
{
        UP=`$MYSQL -u $USERNAME -p$PASSWORD -e \
                "SHOW SLAVE STATUS \G" |
                $GREP Slave_IO_Running |
                $AWK '{print $2}'`

        if [ $UP = Yes ]
        then
                # Let's check if everything is good, then...
                CHECK
        else
                # Uh oh...let's not do anything.
                echo "MySQL Slave IO is not running!" | $LOGGER
                exit 1
        fi
}

# How is everything?
ALIVE

#EoF
exit 0

By: Jeff Buchbinder

This might be a little bit simpler (works with MySQL 5.0+, I believe):

#!/bin/bash done=0 while [ $done -eq 0 ]; do # get status done=$( mysql -Be 'show slave status;' | tail -1 | cut -f12 | grep Yes | wc -l ) if [ $done -eq 0 ]; then echo "Advancing position past [$(mysql -Be 'show slave status;' | tail -1 | cut -f20)]... " mysql -uroot -Be "SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1; start slave;" sleep 1 fi done

By: Anonymous

while [ ! "`mysql -uroot -Be 'SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1; start slave;'`" ]
do
echo "Skipped one Error"
sleep 1
done ; echo "All set"

By: Anonymous

True, but, sometimes the slave will try to execute things that don't apply like a "drop trigger" statement for a trigger that doesn't ever exist because the slave is only replicating specific tables.

By: Richard

Bear in mind that any time you have a query which *did* successfully execute on the master and is skipped on the slave and you use a SQL_SLAVE_SKIP_COUNTER method to "fix" the problem, your master and slave are now no longer in sync. Yes, this is sometimes necessary, but if it is a recurring issue, then the problems go much deeper than merely broken replication.

By: Anonymous

Indeed. And if you skip a query to 'fix' the replication,  you run the very serious risk that the replication will become even more out of sync further down the line. This isn't fixing, it's just brushing the problem under the carpet and hoping it goes away. If you must skip a query, look at the query first, and be sure its absence won't cause future queries to fail

By: Anonymous

noted that this is not the proper "fix" for the problem, but we are missing your proposed solution

By: sureshkumar

Thanks.......... Good work ...

 

 

By: Farshid

Thank you. I got replication working without having to rebuild in middle of night remotely!

By: Ramanath

What i have to do if Slave_IO_Running: No and Seconds_Behind_Master: NULL. Please help me.I am waiting for your response.

https://www.howtoforge.com/how-to-repair-mysql-replication.. In this link you have mentioned the solution for Slave_SQL_Running: No.Please suggest me for Slave_IO_Running: No and Seconds_Behind_Master: NULL

By: Some Guy

Thanks man, saved me today :)

By: Eternal

Thanks, pretty useful article.

By: Mahendra Singh Bisht

Hello All, I am facing issue with the replication lagging on one of the server(Master-Master replication setup).On one server when I do show full processlist, it shows no query running and the replication is in sync on it whereas on the other server I see many Delete queries running and on this server the replication is lagging and sometime it fluctuates. Reference query : delete from Serv_Us where nodeid='ee208f37028242cc9596b12cbf8a42f6'; The above query comes 12-15 times(Thread) with different nodeid. MySQL Error logs shows : 2016-01-25 11:53:00 39283 [Warning] Slave SQL: Error 'Deadlock found when trying to get lock; try restarting transaction' on query. Default database: 'xyz'. Query: 'insert into serv_us (requestnr, authorizedreqnr, completedreqnr, serviceid, nodeid) values (73, 73, 71, x'0CB3CCA9E88BEC838E592C905FD9D4E4', 'e0f18008c2dd4979b183afff1918d108')', Error_code: 1213 The Serv_Us tables has Primary key as nodeid. Kindly help on this. Thanks

By: natanfelles

Thank you!!!

By: Dazy Parker

Thanks for this information.

By: Mikalai Sheuko

Thank you very much!

By: Anon

Thank you, works like a charm

By: graeme

How did you know how many to skip, how do you know that other statements hadnt been executed that didnt cause an sql error but will mean the slave is no longer consisten with the master, such as an insert to a table with no pk

 

By: charleslumia

Thanks. After strugling for 2 days this worked!!!

By: Gerardo Leonardo

Just sharing:

 

while [ 1 ]; do if [ `mysql -uroot -ppassword -e"show slave status \G;" | grep "Duplicate entry" | wc -l` -eq 2 ] ; then mysql -uroot -ppassword -e"stop slave; set global sql_slave_skip_counter=1; start slave;"; fi; sleep 1; mysql -uroot -ppassword -e"show slave status\G"; done

By: Kiran

Great Gerardo,Though Falko is great in providing solution, I was skipping it as it is one time fix for one SQL and the assumption was both nodes are already in sync. Your code can make replication work on two inconsistent databases (may be similar copies but not the same copies). Do you suggest us to keep it as a shell script in a job?

Do we have any option in the configuration to catch up all the changes on the other server just like SQL Server?

By: MLY

Sir, Thank you soooo very much for this command, I was tired like hell repeating this process.

It worked like charm.

Thanks a lot Sir.

By: Hans Ekbrand

This article was very helpful for me, thanks for taking the time to write it.

By: Anonymous

The proper "fix" is to find out what is causing replication to break repeatedly, fix that problem/problems and resync the data or rebuild the slave. 

By: John

Correct, and this tutorial describes how to "resync the data or rebuild the slave" to cite your text.

By: Hassan

This was one of the best posts on the internet ever. It helped me even with a 2-way replication. I did the same on both servers to continue the 2way replication.

Thanks

By: jim

cool bro .

question is : is there a precise value about the 'SQL_SLAVE_SKIP_COUNTER'

or we set it by our experience ?