You are here

FromDual TechFeed (en)

Replication Troubleshooting - Classic VS GTID

Abdel-Mawla Gharieb - Fri, 2014-07-04 15:05

In previous posts, I was talking about how to set up MySQL replication, Classic Replication (based on binary logs information) and Transaction-based Replication (based on GTID). In this article I'll summarize how to troubleshoot MySQL replication for the most common issues we might face with a simple comparison how can we get them solved in the different replication methods (Classic VS GTID).

There are two main operations we might need to do in a replication setup:

  • Skip or ignore a statement that causes the replication to stop.
  • Re-initialize a slave when the Replication is broke and could not be started anymore.
Skip or Ignore statement

Basically, the slave should be always synchronized with its master having the same copy of data, but for some reasons there might be inconsistency between both of them (unsafe statement in SBR, Slave is not read_only and was modified apart of replication queries, .. etc) which causes errors and stops the replication, e.g. if the master inserted a record which was already inserted on the slave (Duplicate entry) or updated/deleted a row which was not exist on the slave, ... etc.

To solve this issue, we have to either reverse what we have done on the slave (e.g. delete the inserted rows) if that was made by mistake and is known or we can skip executing those statements on the slave and continue the replication again (I'll focus on skipping a statement in this post as it needs different interaction in Classic and GTID replication).

Sample error messages (from SHOW SLAVE STATUS output): Last_SQL_Error: Could not execute Write_rows event on table test.t1; Duplicate entry '4' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mysql-bin.000304, end_log_pos 285 Last_SQL_Error: Could not execute Update_rows event on table test.t1; Can't find record in 't1', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000304, end_log_pos 492 Last_SQL_Error: Could not execute Delete_rows event on table test.t1; Can't find record in 't1', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000304, end_log_pos 688 How to solve that issue ?
CLASSIC REPLICATION

Solving this problem is a straight forward process in the classic replication setup, what only we need is to issue the following SQL commands on the slave's:

SQL> SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1; SQL> START SLAVE;
GTID REPLICATION

Solving this problem is not a straight forward in GTID replication like it is in the Classic replication and the variable SQL_SLAVE_SKIP_COUNTER wont be useful in this area anymore.

To get this problem solved in a GTID replication we will need to inject an empty transaction as follows:

  • Check which transaction is causing the problem: SQL> SHOW SLAVE STATUS\G . . Retrieved_Gtid_Set: b9b4712a-df64-11e3-b391-60672090eb04:1-7 Executed_Gtid_Set: 4f6d62ed-df65-11e3-b395-60672090eb04:1, b9b4712a-df64-11e3-b391-60672090eb04:1-6 Auto_Position: 1

    Retrieved_Gtid_Set means the retrieved GTIDs from the master

    Executed_Gtid_Set means the executed GTIDs on the slave.

    According to the above output, the slave retrieved GTIDs from 1:7 (b9b4712a-df64-11e3-b391-60672090eb04:1-7) and executed only from 1:6 (b9b4712a-df64-11e3-b391-60672090eb04:1-6), so the problem is in transaction number 7.

  • Inject an empty transaction: SQL> SET GTID_NEXT='b9b4712a-df64-11e3-b391-60672090eb04:7'; SQL> BEGIN;COMMIT; SQL> SET GTID_NEXT='AUTOMATIC'; SQL> START SLAVE;

    BE CAUTIOUS: The first part of Executed_Gtid_Set (4f6d62ed-df65-11e3-b395-60672090eb04:1) is the local executed GTIDs (not received from the master) while the second part (b9b4712a-df64-11e3-b391-60672090eb04:1-6) is the executed GTIDs which retrieved from the master (check the master's UUID by either checking the UUID value in "Retrieved_Gtid_Set" which is basically for the master's UUID or by issuing SHOW GLOBAL VARIABLES LIKE 'server_uuid'; on the master server). So we should make sure that we are using the master's UUID when injecting an empty transaction, otherwise, the problem will still remain and the slave wont be started.

Note:

After starting the slave successfully in either classic or GTID replication we might need to use a combination of Percona tools pt-table-checksum and pt-table-sync to fix the inconsistency problem.

Re-initialize/ re-build a slave

For many reasons, we might end up with only re-build a slave to get the replication working, e.g. if we stopped a slave for a while where the master purged the binary log file that is needed by that slave or there are many duplicate entry errors so that pt-table-checksum and pt-table-sync could not be used then we have to re-initialize the slave from the beginning by having a fresh backup from the master server and restore it on the slave. Lets check how can we do that in both replication methods.

How to solve that issue ?
CLASSIC REPLICATION
Sample error message:
Last_IO_Errno: 1236 Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Could not find first log file name in binary log index file'

Fix steps:

  • Backup the master server by the following command: shell> mysqldump -u root -p --all-databases --flush-privileges --single-transaction --master-data=2 --flush-logs --triggers --routines --events --hex-blob >/path/to/backupdir/full_backup-$TIMESTAMP.sql
  • Restore the backup file on the slave: shell> mysql -u root -p < /path/to/backupdir/full_backup-$TIMESTAMP.sql
  • Get the binary logs information when the backup was taken: shell> head -n 50 /path/to/backupdir/full_backup-$TIMESTAMP.sql|grep "CHANGE MASTER TO" CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000011', MASTER_LOG_POS=120;
  • Issue the "CHANGE MASTER TO" command using the new information: SQL> CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000011', MASTER_LOG_POS=120;
  • Start the slave: SQL> START SLAVE;

NOTE:

Xtrabackup tool could be used instead of mysqldump,especially, if the database size is big. Check out this link for more information.

GTID REPLICATION
Sample error message:
Last_IO_Errno: 1236 Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.'

Fix steps:

  • Backup the master server by the following command: shell> mysqldump -u root -p --all-databases --flush-privileges --single-transaction --flush-logs --triggers --routines --events --hex-blob >/path/to/backupdir/full_backup-$TIMESTAMP.sql
  • Check the GTID value when the backup was taken: shell> head -n 50 /path/to/backupdir/full_backup-$TIMESTAMP.sql|grep PURGED SET @@GLOBAL.GTID_PURGED='b9b4712a-df64-11e3-b391-60672090eb04:1-8';
  • Reset the GTID_EXECUTED and GTID_PURGED values on the slave: SQL> RESET MASTER;
  • Restore the backup file on the slave: shell> mysql -u root -p < /path/to/backupdir/full_backup-$TIMESTAMP.sql
  • Make sure that the values of GTID_EXEUCTED and GTID_PURGED are the correct ones: SQL> SHOW GLOBAL VARIABLES LIKE 'gtid_executed'; +---------------+------------------------------------------+ | Variable_name | Value | +---------------+------------------------------------------+ | gtid_executed | b9b4712a-df64-11e3-b391-60672090eb04:1-8 | +---------------+------------------------------------------+ 1 row in set (0.00 sec) SHOW GLOBAL VARIABLES LIKE 'gtid_purged'; +---------------+------------------------------------------+ | Variable_name | Value | +---------------+------------------------------------------+ | gtid_purged | b9b4712a-df64-11e3-b391-60672090eb04:1-8 | +---------------+------------------------------------------+ 1 row in set (0.01 sec)
  • Start the slave: SQL> START SLAVE;

NOTES:

  • If we didn't reset the GTID_EXECUTED and GTID_PURGED values on the slave before restoring the backup file, the following error will be appeared:
    shell> mysql -u root -p < /path/to/backupdir/full_backup-$TIMESTAMP.sql. ERROR 1840 (HY000): @@GLOBAL.GTID_PURGED can only be set when @@GLOBAL.GTID_EXECUTED is empty.

    The above error indicates that the statement at the beginning of the backup file - which is "SET @@GLOBAL.GTID_PURGED='b9b4712a-df64-11e3-b391-60672090eb04:1-8';" - failed because GTID_PURGED cannot be set unless GTID_EXECUTED is empty. Since GTID_EXECUTED is a read only variable, the only way to empty its value is to issue "RESET MASTER" on the slave server before restoring the backup file.

  • Xtrabackup tool could be used as well instead of mysqldump to get this problem solved and without the need to reset GTID_EXECUTED and GTID_PURGED values . Check out this link for more information.
Conclusion

While GTID provides many benefits over the classic replication but it has different troubleshooting and fix strategies which must be known first before deploying GTID in production systems.

Replication channel fail-over with Galera Cluster for MySQL

Shinguz - Thu, 2014-06-19 07:05
Taxonomy upgrade extras: channelgaleraclusterfail-overreplicationmasterslave

Sometimes it could be desirable to replicate from a Galera Cluster to a single MySQL slave or to an other Galera Cluster. Reasons for this measure could be:

  • An unstable network between two Galera Cluster locations.
  • A separation of a reporting slave and the Galera Cluster so that heavy reports on the slave do not affect the Galera Cluster performance.
  • Mixing different sources in a slave or a Galera Cluster (fan-in replication).

This article is based on earlier research work (see MySQL Cluster - Cluster circular replication with 2 replication channels) and uses the old MySQL replication style (without MySQL GTID).

Preconditions
  • Enable the binary logs on 2 nodes of a Galera Cluster (we call them channel masters) with the log_bin variable.
  • Set log_slave_updates = 1 on ALL Galera nodes.
  • It is recommended to have small binary logs and relay logs in such a situation to reduce overhead of scanning the files (max_binlog_size = 100M).
Scenarios

   

Let us assume that for some reason the actual channel master of channel 1 breaks. As a consequence the slave of channel 1 does not receive any replication events any more. But we have to keep the replication stream up and running. So we have to switch the replication channel to channel master 2.

Switching replication channel

First for security reasons we should stop the slave of replication channel 1 first:

mysql> STOP SLAVE;

Then we have to find the actual relay log on the slave:

mysql> pager grep Relay_Log_File mysql> SHOW SLAVE STATUS\G mysql> nopager Relay_Log_File: slave-relay-bin.000019

Next we have to find the last applied transaction on the slave:

mysql> SHOW RELAYLOG EVENTS IN 'slave-relay-bin.000019'; | slave-relay-bin.000019 | 3386717 | Query | 5201 | 53745015 | BEGIN | | slave-relay-bin.000019 | 3386794 | Table_map | 5201 | 53745067 | table_id: 72 (test.test) | | slave-relay-bin.000019 | 3386846 | Write_rows | 5201 | 53745142 | table_id: 72 flags: STMT_END_F | | slave-relay-bin.000019 | 3386921 | Xid | 5201 | 53745173 | COMMIT /* xid=1457451 */ | +------------------------+---------+-------------+-----------+-------------+--------------------------------+

This is transaction 1457451 which is the same on all Galera nodes.

On the new channel master of channel 2 we have to find now the matching binary log. This can be done best by matching times between the relay log and the binary log of master of channel 2.

On slave:

shell> ll *relay-bin* -rw-rw---- 1 mysql mysql 336 Mai 22 20:32 slave-relay-bin.000018 -rw-rw---- 1 mysql mysql 3387029 Mai 22 20:37 slave-relay-bin.000019

On master of channel 2:

shell> ll *bin-log* -rw-rw---- 1 mysql mysql 2518737 Mai 22 19:57 bin-log.000072 -rw-rw---- 1 mysql mysql 143 Mai 22 19:57 bin-log.000073 -rw-rw---- 1 mysql mysql 165 Mai 22 20:01 bin-log.000074 -rw-rw---- 1 mysql mysql 62953648 Mai 22 20:40 bin-log.000075

It looks like binary log 75 of master 2 matches to relay log of our slave.

Now we have to find the same transaction on the master of channel 2:

mysql> pager grep -B 6 1457451 mysql> SHOW BINLOG EVENTS IN 'bin-log.000075'; mysql> nopager | bin-log.000075 | 53744832 | Write_rows | 5201 | 53744907 | table_id: 72 flags: STMT_END_F | | bin-log.000075 | 53744907 | Xid | 5201 | 53744938 | COMMIT /* xid=1457450 */ | | bin-log.000075 | 53744938 | Query | 5201 | 53745015 | BEGIN | | bin-log.000075 | 53745015 | Table_map | 5201 | 53745067 | table_id: 72 (test.test) | | bin-log.000075 | 53745067 | Write_rows | 5201 | 53745142 | table_id: 72 flags: STMT_END_F | | bin-log.000075 | 53745142 | Xid | 5201 | 53745173 | COMMIT /* xid=1457451 */ | +----------------+----------+-------------+-----------+-------------+---------------------------------------+

We successfully found the transaction and want the position of the next transaction 53745173 where we should continue replicating.

As a last step we have to set the slave to the master of replication channel 2:

mysql> CHANGE MASTER TO master_host='master2', master_port=3306, master_log_file='bin-log.000075', master_log_pos=53745173; mysql> START SLAVE;

After a while the slave has caught up and is ready for the next fail-over back.

Discussion

We found during our experiments that an IST of a channel master does not lead to a gap or loss of events in the replication stream. So restarting a channel master does not require a channel fail-over as long as an IST can be used for resyncing the channel master with the Galera Cluster.

The increase of wsrep_cluster_conf_id is NOT an indication that a channel fail-over is required.

A SST resets the binary logs so after the SST a slave will not replicate any more. So using this method should be safe to use. If you find any situation where you experience troubles with channel fail-over please let us know.

GTID In Action

Abdel-Mawla Gharieb - Thu, 2014-06-12 14:09

In a previous post I was talking about How to Setup MySQL Replication using the classic method (based on binary logs information). In this article I'll go through the transaction-based replication implementation using GTID in different scenarios.

The following topics will be covered in this blog:

What is the concept of GTID protocol?

GTID is a Global Transaction IDentifier which introduced in MySQL 5.6.5. It's not only unique on the server it was originated but it's unique among all servers in a replication setup.
GTID also guarantee consistency because once a transaction is committed on a server, any other transaction having the same GTID will be ignored, i.e. a committed transaction on a master will be applied only once on the slaves.

GTID consists of two parts separated by a column {source_id:transactions_id}.

WHERE

  • source_id: Normally the server's UUID on which the transaction originates. e.g. "b9b4712a-df64-11e3-b391-60672090eb04" .
  • transaction_id: A sequence number determining the order of the committed transaction.

The following is the GTID for the third transaction on a server having the uuid "b9b4712a-df64-11e3-b391-60672090eb04":
b9b4712a-df64-11e3-b391-60672090eb04:3

As a new protocol in MySQL there is a set of new related variables, the following are the most important ones (IMHO):

  • gtid-mode: ON|OFF to enable or disable GTID, this is not a Boolean variable (0 and 1 are not acceptable).
  • enforce-gtid-consistency: prevent executing the non transactionally safe statements, like:
    • CREATE TABLE .. SELECT.
    • CREATE TEMPORARY TABLE (inside a transaction).
    • Statements that update nontransactional tables inside a transaction.
  • gtid_purged: The set of transactions that have been purged from the binary logs.
  • gtid_executed: The set of transactions which already executed on that server.
  • gtid_next: The GTID which will be used for the next transaction.
GTID Replication Implementation Fresh Installations

Fresh installation means that there's no data yet in the master or in other words, we are building a replication setup from scratch.

The implementation process is divided into two parts:

MASTER'S SIDE CONFIGURATION:
  • Add the following variables to the MySQL configuration file (my.cnf): [mysqld] server-id=1 log-bin=mysql-bin binlog_format=ROW gtid-mode=on enforce-gtid-consistency log-slave-updates
  • Restart MySQL so that configuration changes take place: shell> service mysql restart
  • Create a MySQL user to be used by the slave: SQL> GRANT REPLICATION SLAVE ON *.* TO 'slave_user_name'@'slave_ip' IDENTIFIED BY 's3cret';

SLAVE'S SIDE CONFIGURATION:
  • Add the following variables to the my.cnf file: [mysqld] server-id=2 log-bin=mysql-bin binlog_format=ROW relay_log=relay-log skip-slave-start gtid-mode=on enforce-gtid-consistency log-slave-updates
  • Restart MySQL so that configuration changes take place: shell> service mysql restart
  • Set the master information on the slave's:

    Unlike the classic method, we don't need the master's binary log information and only what we need is to specify MASTER_AUTO_POSITION=1 instead:

    SQL> CHANGE MASTER TO -> MASTER_HOST='master_ip', -> MASTER_PORT=3306, -> MASTER_USER='slave_user_name', -> MASTER_PASSWORD='s3cret', -> MASTER_AUTO_POSITION=1;
  • Start replication: SQL> START SLAVE;
  • Check the replication status: SQL> show slave status\G Slave_IO_State: Waiting for master to send event Master_Host: 127.0.0.1 Master_User: gtid_repl Master_Port: 3320 Connect_Retry: 60 Master_Log_File: mysql-bin.000007 Read_Master_Log_Pos: 191 Relay_Log_File: relay-log.000004 Relay_Log_Pos: 401 Relay_Master_Log_File: mysql-bin.000007 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: . . . . Retrieved_Gtid_Set: b9b4712a-df64-11e3-b391-60672090eb04:1-2 Executed_Gtid_Set: b9b4712a-df64-11e3-b391-60672090eb04:1-2 Auto_Position: 1
Adding New Slave

It's a very simple process to add a new slave to a running replication (or setup replication with existing data) where GTID is being used:

  • Backup the master server shell> mysqldump -u root -p --all-databases --flush-privileges --single-transaction --flush-logs --triggers --routines --events --hex-blob >/path/to/backupdir/full_backup-$TIMESTAMP.sql
  • On the new slave, use the same MySQL configuration as described above (except the server id which should be unique) and restart it.
  • Restore the backup file taken from the master.
  • Use change master to with MASTER_AUTO_POSITION=1
  • Start the slave.

Is it so simple like that!! How did the slave know the backup position? What if some transactions were executed on the master after that backup?

Actually, when GTID is enabled, mysqldump includes the last transaction ID (GTID) at the time of taking the backup:

-- -- GTID state at the beginning of the backup -- SET @@GLOBAL.GTID_PURGED='b9b4712a-df64-11e3-b391-60672090eb04:1-7';

After restoring the backup the variable GTID_EXECUTED will be equal to GTID_PURGED (the above value) and when the slave starts it first sends the range of GTIDs it has executed (GTID_EXECUTED) to the master so that the master can sends back every missing transaction which was not applied yet on the slave.

Migration from classic replication to GTID replication How to perform the migration?

To migrate an already running replication using the classical method to GTID replication, the following steps should be done:

  • Ensure that all servers (master and slaves) are in the same point by setting the master server as read only (SET GLOBAL read_only=ON;) and wait until all slaves catch up the master's data.
  • Shutdown MySQL on all servers and add the GTID variables to the configuration files.
  • Beside the GTID variables, add read-only to the master's configuration and skip-slave-start to the slaves configurations.
  • Start MySQL service on all servers.
  • Issue the change master command with MASTER_AUTO_POSITION=1 on all slaves and then start them.
  • Make the master writable again by SET GLOBAL read_only=OFF; (don't forget to remove/hash it from the master's my.cnf file as well).
Is online migration from classic to GTID replication available?

At the time of writing this article, the online migration is not applicable - as you can see from the above steps - we have to shutdown ALL servers at the same time and that is because of two reasons:

  • GTID can NOT be enabled online because GTID_MODE is a read only variable (having this variable to be dynamic is already in Oracle's plan).
  • Replication can NOT be established between two or more servers having different values for GTID_MODE, i.e. either GTID is enabled on ALL servers or disabled on ALL servers.
Workaround ??

There's a feature request (by MySQL Devs team at Booking.com) to have an extra GTID mode (ANONYMOUS_IN-GTID_OUT) which allows a slave to receives anonymous transactions (transactions from master having GTID_MODE = OFF which do not have GTIDs) and assigns GTIDs for those transactions. In this case, this slave could be used as an intermediate server between master having GTID disabled and slaves having GTID enabled (it will be slave for the master and master for the other slaves)

The online migration steps would be:

  • Restart a slave (lets name it slaveA) using the GTID_MODE = ANONYMOUS_IN-GTID_OUT.
  • Rolling restart to the other slaves to use the normal GTID_MODE=ON and pointing them to slaveA as a new master.
  • Point the application to write to slaveA instead of the old master.
  • Restart the old master to use GTID_MODE=ON and having slaveA as a master.

Note: This is not yet available in Oracle binaries

More information on this could be find here.

GTID Benefits
  • Simplifies the setup of MySQL replication as master's binary logs information is not needed anymore (binary log file name and position).
  • Consistency is guaranteed between master and slave as the committed transaction on the master will be applied only once on the slave.
  • Simple to determine whether masters and slaves are consistent or not.
  • Fail-over process is much easier. When the master fail to operate, no need to calculate a slave's binary logs information before promoting it to be new master. MASTER_AUTO_POSITION=1 will do the job as all transactions in all servers inside the replication have the same GTID.
  • Automatic fail-over scripts is now much easier to implement.

In a future post, I will write about how to troubleshoot GTID replication.

Backup Manager for MySQL, MariaDB and Percona Server (mysql_bman)

Shinguz - Tue, 2014-05-06 17:28
Taxonomy upgrade extras: mysqlBackupRestoreRecoverymysql_bmanpitrAbout

The MySQL Backup Manager (mysql_bman) is a wrapper script for standard MySQL backup tools. The Problem with MySQL backup tools is, that they have many options and thus are overcomplicated and errors are easy made.

mysql_bman has the intention to make backups for MySQL easier and technically correct. This means it should per default not allow non-consistent backups or complain if some functions or parameters are used in the wrong way to guarantee proper backups.

In addition it has added some nice features which are missing in standard MySQL backup tools or which are only known from Enterprise backup solutions.

Where can mysql_bman help you

The intention of mysql_bman is to assist you in bigger MySQL set-ups where you have to follow some backup policies and where you need a serious backup concept.

Backup types

To achieve this we have defined different backup types:

TypeDescriptionfullfull logical backup (mysqldump) of all schemasbinlogbinary-log backupconfigconfiguration file backup (my.cnf)structurestructure backupcleanupclean-up of backup pieces older than n daysschemabackup of one or more schemas

A backup type is specified with the option --type=<backup_type>.

Backup modes

A backup can either be logical or physical. A logical backup is typically what you do with mysqldump. A physical backup is typically a physical file copy without looking into the data. That is what for example xtrabackup does.

The backup mode is specified with the option --mode=<backup_mode>. The following backup modes are available:

ModeDescriptionlogicaldo a logical backup (mysqldump).physicaldo a physical backup (mysqlbackup/innobackup/xtrabackup)Backup policies

Further we have introduced different backup policies. Policies are there to distinguish how different backups should be treated.

The following backup policies exist:

PolicyDescriptiondailydirectory to store daily backupsweeklydirectory to store weekly backupsmonthlydirectory to store monthly backupsquarterlydirectory to store quarterly backupsyearlydirectory to store yearly backups

For example you could plan to do a daily MySQL backup with binary logs with a retention policy of 7 days. But once a week you want to do a weekly backup consisting of a full backup, a configuration backup and a structure dump. But this weekly backup you want to keep for 6 months. And because of legal reasons you want to do a yearly backup with a retention policy of 10 years.

A backup policy is specified with the --policy=<backup_policy> option. This leads us to the retention time:

Options

The retention time which should be applied to a specific backup policy you can specify with the option --retention=<period_in_days>. The retention option means that a backup is not deleted before this amount of days when you run a clean-up job with mysql_bman.

Let us do an example:

shell> mysql_bman --type=cleanup --policy=daily --retention=30

This means that all backups in the daily policy should be deleted when they are older than 30 days.

Target

With the --target option you specify the connect string to the database to backup. This database can be located either local (all backup types can be used) or remote (only client/server backup types can be used).

A target looks as follows: user/password@host:port (similar to URI specification) whereas you can omit password and port.

Backup location, archiving, compressing and clean-up

The --backupdir option is to control location of the backup files. The policy folders are automatically created under this --backupdir location.
If you have a second layer of backup stores (e.g. tapes or slow backup drives or deduplicated drives or NFS drives) you can use the --archive option to copy your backup files to this second layer storage which is specified with the --archivedir option. For restore performance reasons it is recommended to always keep one or two generations of backups on you fast local drive. If you want to remove the backuped files from the --backupdir destination after the archive job use the --cleanup option.
If you want to omit to compress backups, either to safe time or because your location uses deduplicated drives you can use the --no-compress option.

Per schema backup

Especially for hosting companies a full database backup is typically not the right backup strategy because a restore of one specific customer (= schema) is very complicated. For this case we have the --per-schema option. mysql_bman will do a backup of the whole database schema by schema. Keep in mind: This breaks consistency among schemas!

Sometimes you want to do a schema backup only for some specific schemas for this you can use the --schema option. This option allows you to specify schemas to backup or not to backup. --schema=+a,+b means backup schema a and b. --schema=-a,-b means backup all schemas except a and b.
The second variant is less error prone because you do not forget to backup a new database.

Instance name

MySQL does not know the concept of naming an instance (mysqld). But for bigger environments it could be useful to uniquely name each instance. For this purpose we have introduced the option --instance-name=<give_it_a_name>. This instance name should be unique within your whole company. But we do not enforce it atm. The instance name is used to name backup files and later to identify the backup history of an instance in our backup catalog and to allow us to track the files for restore.

mysql_bman configuration file

Specifying everything on the command line is cumbersome. Thus mysql_bman considers a configuration file specified with the --config=<config_file> option.
A mysql_bman configuration file looks for example as follows:

policy = daily target = root/secret@127.0.0.1:3306 type = schema schema = -mysql policy = daily archive = on archivedir = /mnt/tape per-schema = on no-compress = on
Simulate what happens

For the Sissies among us (as for example me) we have the --simulate option. This option simulates nearly all steps as far as possible without executing really anything. This option is either for testing some features or for debugging purposes.

Logging

If you want to track your backup history you can specify with the --log option where your mysql_bman log file should be located.

More help

A little more help you can get with the following command:

shell> mysql_bman --help
Examples

Do a full (logical = default) backup and store it in the daily policy folder:

shell> mysql_bman --target=root/secret@127.0.0.1:3306 --type=full --policy=daily

Do a full physical backup and store it in the weekly policy folder (I think this feature is not implemented yet...):

shell> mysql_bman --target=root/secret@127.0.0.1 --type=full --mode=physical --policy=weekly

Do a binary-log backup omitting the password in the target and store it in the daily policy folder:

shell> mysql_bman --target=bman@192.168.1.42:3307 --type=binlog --policy=daily

Do a MySQL configuration backup and store it in the weekly policy folder:

shell> mysql_bman --target=root/secret@127.0.0.1:3306 --type=config --policy=weekly

Do a structure backup and store it in the monthly policy folder and name the file with the instance name:

shell> mysql_bman --target=root/secret@127.0.0.1:3306 --type=structure --policy=monthly --instance-name=prod-db

Do a weekly structure backup and archive it to an other backup location:

shell> mysql_bman --target=root/secret@127.0.0.1:3306 --type=structure --policy=weekly --archive --archivedir=/mnt/tape

Do a schema backup omitting the mysql schema:

shell> mysql_bman --target=root/secret@127.0.0.1:3306 --type=schema --schema=-mysql --policy=daily --archive --archivedir=/mnt/tape

Do a schema backup only of foodmart and world and write it to their own files. Omit compressing these backups because they are located for example on deduplicated drives:

shell> mysql_bman --target=root/secret@127.0.0.1:3306 --type=schema --schema=+foodmart,+world --per-schema --policy=daily --no-compress
Where to get mysql_bman

mysql_bman is a part of the MyEnv utilities.

How to Setup MySQL Master/Slave Replication ?

Abdel-Mawla Gharieb - Thu, 2014-04-24 15:53

It's not usual to find an easy source on how to setup MySQL replication, I thought it might be useful at least for the beginners to write a direct and simple howto blog on setting up Master/Slave replication in MySQL.

Before going through the replication setup steps, I think it's better to explain first how Replication works in MySQL.

MySQL replication is mainly consists of three-part process:

  • The master server records all data changes to its binary logs (binary log events) and send it to the slave using a thread called (Binlog dump thread) once the slave connects to the master.
  • The slave copies the binary log events sent by the master's binlog dump thread to its relay logs using a thread called (Slave I/O thread).
  • The slave applies these changes from the relay logs to its data by replaying (executing) all events using a thread called (Slave SQL thread).

Now, lets go through the setup process which is divided into 3 main sections:

Master's side configuration:
  • Add the following variables to the MySQL configuration file (my.cnf): [mysqld] server-id=1 ## must be unique log-bin=mysql-bin binlog_format=ROW
  • Restart MySQL so that configuration changes take place: shell> /etc/init.d/mysql restart
  • Create a MySQL user to be used by the slave: SQL> GRANT REPLICATION SLAVE ON *.* TO 'slave_user_name'@'slave_ip' IDENTIFIED BY 's3cret';
  • Take a full snapshot from the master's databases: shell> mysqldump -u root -p --all-databases --flush-privileges --single-transaction --master-data=2 --flush-logs --triggers --routines --events --hex-blob >/path/to/backupdir/full_backup-$TIMESTAMP.sql

    Note:

    If you have MyISAM tables you should omit the option --single-transaction while --master-data=2 will automatically turn on --lock-all-tables option which is required for MyISAM tables).
  • After preparing the backup file transfer it to the slave server.
  • If you are setting up fresh M/S servers (no data is there on the master) you don't have to backup the master databases and all what you need is to execute the following command on the master server just to know the master's coordinates which will be used later on the slave's: SQL> SHOW MASTER STATUS\G *************************** 1. row *************************** File: mysql-bin.000275 Position: 120 Binlog_Do_DB: Binlog_Ignore_DB: Executed_Gtid_Set:

And that is it on the master's, let's do the slave's work.

Slave's side configuration:
  • Add the following variables to the my.cnf file: [mysqld] server-id=2 ## must be unique. relay_log=relay-log skip-slave-start ## useful to make any checks before starting the slave (this way, slave must be started manually after each mysql restart)
  • Restart MySQL so that configuration changes take place: shell> /etc/init.d/mysql restart
  • Restore the full backup taken from the master (if performed): shell> mysql -u root -p < /path/to/backupdir/full_backup-$TIMESTAMP.sql.
  • Get the master's coordinates information from the backup file: shell> head -n 50 /path/to/backupdir/full_backup-$TIMESTAMP.sql|grep "CHANGE MASTER TO"

    OR from the output of "SHOW MASTER STATUS;" in case of no backups taken (check the final step in the "Master's side configuration").

  • Set the master information on the slave's: SQL> CHANGE MASTER TO -> MASTER_HOST='master_ip', -> MASTER_PORT=3306, -> MASTER_USER='slave_user_name', ## The user which was created on the master -> MASTER_PASSWORD='s3cret', -> MASTER_LOG_FILE='mysql-bin.000275', ## taken from the above step -> MASTER_LOG_POS=120; ## taken from the above step
  • Start replication: SQL> START SLAVE;
Replication checking and troubleshooting:
  • Once the slave is started, check the replication status on the slave by the following command: SQL> SHOW SLAVE STATUS\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 127.0.0.1 Master_User: slave_user_name Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000275 Read_Master_Log_Pos: 120 Relay_Log_File: mysql-relay.000001 Relay_Log_Pos: 283 Relay_Master_Log_File: mysql-bin.000275 Slave_IO_Running: Yes Slave_SQL_Running: Yes . . . Last_SQL_Error: . . .
  • If the Slave_IO_State= connecting .... then make sure that the slave user information is set correctly and there is no firewall restrictions between the two servers (master and slave) this could be checked by connecting to the master's MySQL from the salve server by the replication user (in this example, slave_user_name).
  • If both Slave_IO_Running and Slave_SQL_Running = Yes, then the replication had been set up correctly.
  • If the Slave_SQL_Running = No, check the value of Last_SQL_Error for more details about the SQL error.
  • If you know that error and you want to ignore it, you can execute "SET GLOBAL sql_slave_skip_counter = 1;" on the slave and then start the slave again "START SLAVE;".
  • To restrict all normal users from changing data on the slave - which might break the replication - the option "read-only" should be added in the slave's my.cnf file.
  • the server option "server-id" must be unique among all servers inside the replication (masters and slaves).
  • If your database size is big (100GB or so) Xtrabackup tool could be used instead of mysqldump - when preparing the master snapshot - for faster backup and restore operations. For more information on how to use Xtrabackup, check out this link.
  • For more information on how to setup MySQL replication, check out the manual documentation.

Setting the right GCache size in Galera Cluster

Abdel-Mawla Gharieb - Thu, 2014-04-17 13:52

One of our customers had a question related to the right value of Galera Cache size (gcache.size) in Galera Cluster for MySQL which I would like to share with you.

The question was: My maintenance window takes 4 hours for my 5TB DB. How can I avoid an SST ?!

Basically, having too small GCache size will lead to SST (Snapshot State Transfer) instead of IST (Incremental State Transfer), thus we can avoid the SST by setting the GCache to the appropriate value.

To check the current value of the GCache size:

mysql> SHOW GLOBAL VARIABLES LIKE 'wsrep_provider_options'\G Variable_name: wsrep_provider_options Value: base_host = 192.168.1.12; . . . gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; . . .

The value of GCache size could be changed by adding the following line in the my.cnf file and restarting the node (it could NOT be changed online):

#my.cnf [mysqld] wsrep_provider_options="gcache.size=256M"

But the question is how can we calculate the right value for GCache size to cover the maintenance window and at the same time not larger than what it needs?

To answer that question we should first find out how much GCache can handle which could be calculated by the following formula:

Hold time = GCache size / Replication Rate.

Where:

  • Replication Rate = Amount of replicated data / time.
  • Amount of replicated data = (wsrep_replicated_bytes + wsrep_received_bytes) after the maintenance window - (wsrep_replicated_bytes + wsrep_received_bytes) before the maintenance window.

The amount of replicated data for the customer's case = 7200MB.

Now, we can find out how much GCache (default 128M) can handle for the customer's case:

Hold time = 128MB / (7200MB / 4h) = 128MB / 0.5 MB = 256s.

Then, we can calculate the right GCache size value to handle the maintenance window by the following formula:
GCache = Maintenance window * Replication Rate = 14400s * 0.5 MB.
GCache = 7200MB.

In other words, the right GCache size should be equivalent to (or not less than) the amount of replicated data.

A shorter way using the binary logs size

Is there any relation between (wsrep_replicated_bytes + wsrep_received_bytes) and the binary log traffic? Lets check that by the following test scenario:

  • Starting a Galera Cluster where the newest binary log file size will be empty (120 bytes) and the same for the status variables: mysql> SHOW GLOBAL STATUS LIKE 'wsrep_replicated_bytes'; +------------------------+-------+ | Variable_name | Value | +------------------------+-------+ | wsrep_replicated_bytes | 0 | +------------------------+-------+ 1 row in set (0.00 sec) mysql> SHOW GLOBAL STATUS LIKE 'wsrep_received_bytes'; +----------------------+-------+ | Variable_name | Value | +----------------------+-------+ | wsrep_received_bytes | 368 | +----------------------+-------+ 1 row in set (0.00 sec)
  • Execute some DML statements on the current node (to increase wsrep_replicated_bytes) and on another node (to increase wsrep_received_bytes), then check the status values:
    mysql> SHOW GLOBAL STATUS LIKE 'wsrep_replicated_bytes'; +------------------------+----------+ | Variable_name | Value | +------------------------+----------+ | wsrep_replicated_bytes | 80125192 | +------------------------+----------+ 1 row in set (0.00 sec) mysql> SHOW GLOBAL STATUS LIKE 'wsrep_received_bytes'; +------------------------+----------+ | Variable_name | Value | +------------------------+----------+ | wsrep_received_bytes | 40062948 | +------------------------+----------+ 1 row in set (0.00 sec)

    The Amount of replicated data = (80125192 + 40062948) - (0 + 368) = 120187772 Bytes.

  • Checking the increase of the binary log file: shell> ll /var/lib/mysql/mysql-bin.000243 -rw-rw---- 1 mysql mysql 113769454 Mar 20 13:42 mysql-bin.000243
  • Notes:

    • The variable log_slave_updates MUST to be enabled on ALL nodes, otherwise, the option wsrep_received_bytes will not be reflected on the binary logs, thus will lead to WRONG calculations!!
    • Since the cluster is freshly started, MySQL started to write into a new binary log file while in an already running cluster we can force MySQL to start from new binary log file by issuing the SQL command "FLUSH BINARY LOGS;" so that we can use the total size of the binary logs generated after that during the maintenance.
    Conclusion:

    Although the binary log traffic will be always less than the amount of replicated data but they are nearly close, thus we can use it to get a rough estimation about the right GCache size value, hence the formula will be as follows:

    • GCache size = Maintenance window * Replication Rate.
    • GCache size = (wsrep_replicated_bytes + wsrep_received_bytes) after the maintenance window - (wsrep_replicated_bytes + wsrep_received_bytes) before the maintenance window.
    • GCache size = binary log traffic which occurs during the maintenance window.

Impact of General Query Log on MySQL Performance

Abdel-Mawla Gharieb - Tue, 2014-04-08 11:19

Sometimes, it is required to enable the general query log (which is disabled by default). If the general query log is enabled the server writes to this log information when clients connect or disconnect, and each SQL statement received from the client.

The question is, does enabling the general query log affects the MySQL performance ?
Also, it is possible to record the output of this log into either file or table in the mysql database (mysql.general_log), what is the performance impact of each one?

Let's do some simple benchmark for those scenarios to measure the actual impact on the mysql performance.

System Information: HW configurations: Software configurations:
  • OS: Ubuntu 12.04
  • MySQL Server: 5.6.17
  • Sysbench: 0.4.12
Test Information:
  • Sysbench command: sysbench --num-threads=1 --max-requests=1000 --db-driver=mysql --test=oltp --mysql-table-engine=InnoDB --mysql-user=root run .
  • Table structure which was used by sysbench tests: mysql> show create table sbtest.sbtest\G CREATE TABLE `sbtest` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `k` int(10) unsigned NOT NULL DEFAULT '0', `c` char(120) NOT NULL DEFAULT '', `pad` char(60) NOT NULL DEFAULT '', PRIMARY KEY (`id`), KEY `k` (`k`) ) ENGINE=InnoDB AUTO_INCREMENT=8574 DEFAULT CHARSET=latin1

Note:

The test had been made against 1,2,4,8,16 and 32 threads, each throughput/response time value for each thread's number for each test case is generated by the average of ten (10) times execution.

General log Disabled:

To make sure that the general query log is disabled:

mysql> show global variables like'general_log'; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | general_log | OFF | +---------------+-------+

Testing results:

General log Enabled:

The general query log is a dynamic variable which means that it could be enabled or disabled ONLINE without restarting MySQL (since MySQL 5.1):

mysql> SET GLOBAL general_log=ON;

We can choose the output for this log to be either in a log file (by default) or in a MySQL table (mysql.general_log).

What are the benefits that we might get if we have the log output to be stored in a table not in a file?

  • We can use the normal SQL statements to access the log contents to get information about specific criteria (e.g. using WHERE condition) and this is a little harder when dealing with files.
  • The log contents could be accessed remotely if someone can connect to the MySQL server.
  • Standard format for the log entries.
  • If the CSV engine is used for the log table, it will be easy to import the CSV file into spreadsheets.
  • It is easy to expire the logs by simply TRUNCATE the log table.
  • Log rotation is possible by using RENAME TABLE statement.
  • Log entries are not replicated to the slave because they are not written to the binary logs.
  • The mysqldump does not include the log tables contents (general_log or slow_log) in the backup even if --all-databases backup option is used.

So, let's check the performance impact then of each log output.

Output is FILE:

To check the output destination of the general log, the following command should be used:

mysql> show global variables like'log_output'; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | log_output | FILE | +---------------+-------+

Testing results:

Output is TABLE (CSV table):

To change the output destination of the general log from file to table (CSV by default), the following command should be used:

mysql> SET GLOBAL log_output='TABLE'; mysql> show global variables like'log_output'; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | log_output | TABLE | +---------------+-------+

Testing results:

Output is TABLE (MyISAM table):

Maybe due to the nature of the CSV storage ENGINE we faced performance issues in the previous case. Is it possible to change the table engine of the general_log table?

The answer is yes, but unfortunately, we are restricted to use only MyISAM storage engine and no engines other than CSV or MyISAM are allowed. Check this link for more information.

To alter the log table, you must first disable the logging:

mysql> alter table mysql.general_log engine=MYISAM; ERROR 1580 (HY000): You cannot 'ALTER' a log table if logging is enabled mysql> SET GLOBAL general_log=OFF; mysql> alter table mysql.general_log engine=MYISAM; mysql> SET GLOBAL general_log=ON;

Testing results:

Output is TABLE (MyISAM table with some structures changes):

In general, to make any SQL query work faster, we might need to optimize the table structure, add indexes, re-write the query, .. etc.

The following is the general log table structure:

mysql> show create table mysql.general_log\G CREATE TABLE `general_log` ( `event_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, `user_host` mediumtext NOT NULL, `thread_id` bigint(21) unsigned NOT NULL, `server_id` int(10) unsigned NOT NULL, `command_type` varchar(64) NOT NULL, `argument` mediumtext NOT NULL ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COMMENT='General log'

Let's check what can we do to optimize the general_log table structure (other suggestions are welcomed):

  • Can we create partitions on that table which might boost our search?
    Although this is a normal MyISAM table but partitioning IS NOT allowed for the log tables (by the way, partitioning is not permitted also for CSV tables).
  • Can we change the datatype of the `user_host` column from mediumtext to e.g. varchar(100)? (The max length of this column's data on my machine didn't exceed 50 characters)
    While it is - syntax wise - accepted but no logs will be stored in the table after that and the following error will be printed in the error log file: 2014-03-06 18:44:21 6987 [ERROR] Failed to write to mysql.general_log: 2014-03-06 18:44:23 6987 [ERROR] Incorrect definition of table mysql.general_log: expected column 'user_host' at position 1 to have type mediumtext, found type varchar(100).
  • What about creating indexes on the columns we will use for most of our searches (`event_time` and `argument`)? lets try that!
    • Add index on `event_time` column:
      mysql> SET GLOBAL general_log=OFF; mysql> alter table mysql.general_log add index ev_tm_idx(`event_time`); mysql> show create table mysql.general_log\G CREATE TABLE `general_log` ( `event_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, `user_host` mediumtext NOT NULL, `thread_id` bigint(21) unsigned NOT NULL, `server_id` int(10) unsigned NOT NULL, `command_type` varchar(64) NOT NULL, `argument` mediumtext NOT NULL, KEY `ev_tm_idx` (`event_time`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COMMENT='General log' mysql> SET GLOBAL general_log=ON;

      Testing results:

    • Add FULLTEXT index on `argument` column:
      mysql> SET GLOBAL general_log=OFF; mysql> alter table mysql.general_log add fulltext index (`argument`); mysql> show create table mysql.general_log\G CREATE TABLE `general_log` ( `event_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, `user_host` mediumtext NOT NULL, `thread_id` bigint(21) unsigned NOT NULL, `server_id` int(10) unsigned NOT NULL, `command_type` varchar(64) NOT NULL, `argument` mediumtext NOT NULL, KEY `ev_tm_idx` (`event_time`), FULLTEXT KEY `argument` (`argument`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COMMENT='General log' mysql> SET GLOBAL general_log=ON;

      Testing results:


To make it more clear, the following is the combination of all results in one graph followed by response time comparison:





The raw results in Transactions / Sec might be useful:


Threads12481632General log disabled383.996814.7591421.2881674.7331414.9851071.189General log enabled (File)281.642521.391230.7431406.1271095.896923.986General log enabled (CSV Table)231.659447.173787.578507.846426.324439.992General log enabled (MyISAM Table)249.47536.379933.304532.912476.454454.015General log enabled (MyISAM Table + index)238.508430.05875.209465.464465.464395.063General log enabled (MyISAM Table + Fulltext index)157.436236.156210.968212.273218.617220.701
Conclusion:
  • The best MySQL performance - among all above test cases - could be achieved by disabling the general query log, e.g. if we compared the above results for the case 4 concurrent threads (the highest value for most scenarios) we would find that:
    • Using the general log enabled (general_log = ON) and the log destination is file (log_output = FILE) decreased the throughput by 13.4% and increased the response time by 17.5%.
    • Using the general log enabled and the log destination is CSV table decreased the throughput by 44.6% and increased the response time by 90%.
    • Using the general log enabled and the log destination is MyISAM table decreased the throughput by 34.3% and increased the response time by 59%.
    • Using the general log enabled and the log destination is MyISAM having an index added on column `event_time` decreased the throughput by 38.4% and increased the response time by 73%.
    • Using the general log enabled and the log destination is MyISAM having an index added on column `event_time` and FULLTEXT index added on column `argument` decreased the throughput by 85% and increased the response time by 542%.
  • Although using table as a log output destination has many benefits - as described above - but it has more negative impact on the MySQL performance as compared to log file.
  • Increasing the number of concurrently running threads - in the case of log_output=TABLE - will increase the general_log table contention which is controlled by the table locking level for MyISAM or CSV ENGINES.
  • Like any other MySQL table - as many rows inserted in the log table as more negative performance impact.
  • Finally, it is preferred to only enable the general query log when it is really needed and it is not recommended to enable it in a production system. It could be enabled (dynamically) for a while then should be disabled again once we have got what we are searching for.
Subscribe to FromDual aggregator - FromDual TechFeed (en)