Troubleshooting Replication Problems

What To Do When DMan Replication Status is Red

The master server writes each incoming command to a binlog file. The slave server processes each line of the file in queue. DMan monitors this process for several factors that would indicate that replication is not working or falling behind.

Slave is behind master N seconds

When the slave server starts to fall behind, the DMan status will report the difference in time. This could be due to network latency or I/O problems on the slave or master. Monitor the delta time between the slave and master and ensure it is getting smaller and not larger. If the time that the slave is behind continues to grow you will need to evaluate the I/O problem.

Error - Unable to process N

Sometimes the slave comes across a command in the master binlog file that it cannot process. When this happens the slave stops processing the queue and if left for a while the slave can get very behind in the queue. Evaluate the line that was unable to process on the slave, take corrective steps, and run the Resynchronize Your Replication Servers procedure below. Before you do the full restore though you can try skipping the next 1 or 2 SQL statements to see if you can resolve the problem that way.

Skip a bad SQL Statement

It’s possible to skip 1 or more incompatible SQL statements in the replication log to get replication running again. This can be an option instead of do a full reset. The following notes are a summary of: https://www.howtoforge.com/how-to-repair-mysql-replication

First confirm that there is a SQL statement error.

show slave status

If there is an SQL statement error then you may be able to skip it using the following technique. If it is an IO_Error then this won’t help you.

STOP SLAVE; SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1; -- skips the next 1 statement(s) START SLAVE; SHOW SLAVE STATUS;

You can skip more than 1 statement by using a larger value in the SQL_SLAVE_SKIP_COUNTER value.

If the slave successfully restarts then your replication problem is resolved but you may need to consult with the application developers to see if that statement can be expected to recur. Changes may be necessary to avoid the troublesome statements.

Resynchronize Your Replication Servers

If your replication has been blocked by a command line that cannot process on the slave server, it may have gotten out of synch to an extent that the easiest way is to resynch with a data duplication. To do this simply follow the original instructions to backup the master and restore to the slave found on the Running A Backup Server for Bannister Lake Products with DMan Backup in the section that starts with: Making a Copy of the Primary Database For Import On the Backup Server.


Slave IO Connection Error

If the slave is not able to connect to the master you might see an IO Error message like this:

error connecting to master 'blbackup@10.21.31.52:3306' - retry-time: 60 retries: 4 message: Authentication plugin 'caching_sha2_password' reported error: Authentication requires secure connection.

This error is telling you that the slave is failing to connect to the master server.

Possible Solutions

Use GET_MASTER_PUBLIC_KEY clause

We can add GET_MASTER_PUBLIC_KEY = 1 to the CHANGE MASTER TO statement, and the connection succeeds.

This solution provided by this bug report in MySQL 8.

STOP SLAVE;

Issue the new CHANGE MASTER TO command

Then restart and check the slave status.

Connect Manually First

You might be able to solve this problem by creating a manual connection in a MySQL tool like Workbench. This solution provided by this documented bug in MySQL 8.

Connect to the Primary Server using another connection tab in MySQL Workbench (or another tool).

  • using the blbackup user

After you have successfully connected using the blbackup user and password - restart the slave.

Assuming the slave starts - check the status

Hopefully the slave is now successfully connected.

Confirm User Credentials

Another tip is to check that the login for blbackup exists on the Primary server and has been given appropriate privileges. Then confirm that the slave has been setup with the correct settings (SET MASTER TO…)

See: https://bannisterlake.atlassian.net/wiki/spaces/ST/pages/12615688/Configuring+MySQL+Users#Recommended-Replication-User for details on how to define the user.