You are here
MySQL Tech-Feed (en)
Before I forget it and have to search again here a short note about how to rename a MySQL Partition:
My dream:ALTER TABLE history RENAME PARTITION p2015_kw10 INTO p2015_kw09;
In reality: ALTER TABLE history REORGANIZE PARTITION p2015_kw10 INTO ( PARTITION p2015_kw09 VALUES LESS THAN (UNIX_TIMESTAMP('2015-03-02 00:00:00')) );
Caution: REORGANIZE PARTITION causes a full copy of the whole partition!
Hint: I assume it would be very easy for MySQL or MariaDB to make this DDL command an in-place operation...
MySQL Partitioning was introduced in MySQL 5.1.
MySQL Enterprise Backup (MEB) has the capability to make real incremental (differential and cumulative?) backups. The actual releases are quite cool and you should really look at it...
Unfortunately the original MySQL documentation is much too complicated for my simple mind. So I did some testing and simplified it a bit for our customers...
If you want to dive into the original documentation please look here: Making an Incremental Backup .
If you want to use MySQL Enterprise Backup please let us know and we send you a quote...Prepare MySQL Backup infrastructure mkdir /backup/full /backup/incremental1 /backup/incremental2
Full MySQL Backup mysqlbackup --defaults-file=/etc/my.cnf --user=root --backup-dir=/backup/full backup mysqlbackup --defaults-file=/etc/my.cnf --user=root --backup-dir=/backup/full apply-log
First MySQL Incremental Backup mysqlbackup --defaults-file=/etc/my.cnf --user=root --incremental --incremental-base=dir:/backup/full --incremental-backup-dir=/backup/incremental1 backup mysqlbackup --defaults-file=/etc/my.cnf --user=root --backup-dir=/backup/full --incremental-backup-dir=/backup/incremental1 apply-incremental-backup
Second MySQL Incremental Backup mysqlbackup --defaults-file=/etc/my.cnf --user=root --incremental --incremental-base=dir:/backup/full --incremental-backup-dir=/backup/incremental2 backup mysqlbackup --defaults-file=/etc/my.cnf --user=root --backup-dir=/backup/full --incremental-backup-dir=/backup/incremental2 apply-incremental-backup
and so on...MySQL Restore mysqlbackup --defaults-file=/etc/my.cnf --user=root --backup-dir=/backup/full copy-back
Have fun with MySQL Enterprise Backup. If you need any help with your MySQL Backup concept, please let us know.
MySQL Enterprise Monitor (MEM) has by default no Event Handlers created and activated. These Event Handlers you have to define yourself according to your needs.
We would like to be notified by MySQL Enterprise Monitor when the number of connections is near to max_connections.
For this we search first which Advisors are available at all: Configuration -> Advisors -> Availability.
Here we can see that we have an Advisor called Maximum Connection Limit Nearing Or Reached which is scheduled for every 5 minutes and has thresholds at 75, 85, 95 and 100%:
Now we know which Advisor should create and Event. As a next step we have to create and Event which should be triggered: Configuration -> Event Handling -> Create Event Handler.
Here we can create and Event with all its needed configuration: Events -> All -> server.
If we look at the Events we can even see the detailed description and how the values for the Event are collected:
Task: Event Handler for used disk space
For this Event Handler we need the Advisor Filesystem Free Space under Operating System:
In this advisor we can configure the Threshold as well:
In the Event Handler we can define which Assets shall be monitored. For example the mountpoint: /.
Local disks can only be monitored, if a local MySQL Enterprise Monitor Agent is installed. An agent-less MySQL Enterprise Monitor cannot monitor local disk resources...
Have fun using the MySQL Enterprise Monitor. If you need any help in installing or configuring MEM do not hesitate to contact us.
All these functions are also implemented in the FromDual Performance Monitor for MySQL. If you want to relay on Open Source technology only you should consider our Performance Monitor.
FromDual has the pleasure to announce the release of the new version 1.0.0 of its widely used Nagios and Icinga plug-ins for MySQL, Galera Cluster, MariaDB and Percona Server.
All plug-ins are basically renewed and should now work all correctly.
The new Nagios/Icinga plug-ins can be downloaded here.
In the inconceivable case that you find a bug in the Nagios/Icinga plug-ins please report it to our bug tracker.
Any feedback, statements and testimonials are welcome as well! Please send them to firstname.lastname@example.org.Description of the current functionality
Details about the functionality and the usage of each plug-in you get with the option: --help.
The following Nagios/Icinga plug-in for MySQL and MariaDB are currently available:check_db_mysql.pl
This Nagios/Icinga plug-in alerts you if your MySQL database is not up and running.check_errorlog_mysql.pl and errorLogFilterRules.pm
This Nagios/Icinga plug-in alerts you if it finds some suspicious messages in the MySQL error log.
The rules which messages should be ignored can be found in the file errorLogFilterRules.pm. If you want to add your own filter rules please add them in this file as well.
This Nagios/Icinga plug-in alerts you if the actual number of nodes in your a Galera Cluster is not the expected one.check_repl_mysql_cnt_slave_hosts.pl
This Nagios/Icinga plug-in alerts you if your MySQL Slaves have not reported to their Master properly their existence with the report_host variable.check_repl_mysql_heartbeat.pl
This Nagios/Icinga plug-in alerts you if your MySQL Slave is too many heartbeats behind its Master.check_repl_mysql_io_thread.pl
This Nagios/Icinga plug-in alerts you if your MySQL Slaves IO thread is not up an running.check_repl_mysql_read_exec_pos.pl
This Nagios/Icinga plug-in alerts you if your MySQL Slaves read and execution positions differ too much.check_repl_mysql_readonly.pl
This Nagios/Icinga plug-in alerts you if your MySQL Slave is NOT set to readonly.check_repl_mysql_seconds_behind_master.pl
This Nagios/Icinga plug-in alerts you if your MySQL Slave falls too many seconds behind its Master.check_repl_mysql_sql_thread.pl
This Nagios/Icinga plug-in alerts you if your Slaves SQL thread is not up an running.perf_mysql.pl
This Nagios/Icinga plug-in gathers MySQL and MariaDB performance data.Changes in FromDual Nagios/Icinga plug-ins 1.0.0 All plug-ins
- Usage was improved. The usage can be shown with the --help option.
- Usage states now which GRANT privileges are needed for a specific plug-in.
- Examples added how to use each plug-in.
- Default socket location moved from /tmp/mysql.sock to /var/run/mysqld/mysqld.sock.
- New host/socket convention implemented in all scripts similar to MySQL client tools.
- -epn tag added for Icinga.
- Some bugs fixed.
- More filtering rules added.
- Filtering rules separated into own file.
- Entry point finding problem fixed.
- Script name fixed.
- Unknown command problem with Galera Cluster caught.
- mysqladmin ping removed and implemented in Perl.
For subscriptions for commercial use of this software please get in contact with us.
MySQL provides some great enterprise features beside the MySQL Server. The ones we are asked the most at customers are:
- MySQL Enterprise Backup (MEB)
- MySQL Enterprise Monitor (MEM) and
- MySQL Enterprise Workbench (MWB)
MySQL Enterprise Backup (MEB) is an alternative to the mysqldump backup utility. Its big advantage is its fast backup but even faster restore performance. This is a must for all MySQL users having bigger databases than let's say 10 to 20 Gigabytes and/or having hard requirements for restore times (MTTR).
Last implementation tests we did with a customer for an about 30 Gbyte database were:MEBmysqldumpBackup10 minutes18 minutesRestore12 minutes80 minutes
If you need our help implementing MySQL Enterprise Backup into your backup infrastructure please get in contact with us. MySQL Enterprise Backup also seamlessly integrates into the FromDual Backup Manager for MySQL.MySQL Enterprise Monitor (MEM)
MySQL Enterprise Monitor (MEM) is an Enterprise Monitoring Solution for MySQL which Monitors your business critical MySQL databases. Various predefined advisors rise an alert if something with your precious MySQL database does not work as expected.
Our alternative competitive product is the FromDual Performance Monitor for MySQL.MySQL Enterprise Workbench (MWB)
MySQL Enterprise Workbench (MWB) is the tool modern MySQL Database administrators use to operate their MySQL databases. Old fashion ones still use the CLI... MySQL developers can easily write and test database queries and develop their data model with the ER diagram modeller.Download Enterprise tools
But how can we get now to these precious tools? This is quite easy following the screen shots below:
As a fist step you go to www.mysql.com/downloads:
Here you can find a link to Oracle eDelivery which is the MySQL/Oracle download facility. Then you get to the welcome screen:
Before you get access to the software you have to Sign In with your Oracle/MySQL customer account if you have one. If you do not have an account yet you can Create an Account to get to the software:
Then you have to agree (2 times) to the Terms & Restrictions (this is what Oracle is really good in). Once to the Oracle Trial License Agreement and once to the Export Restrictions:
Then you get to the Media Pack Search. Here you can define what product you are interested in and on which platform you are using it. Unfortunately a sub-product filter cannot be chosen. So you get a long list to pick your final package from:
An last you can download your product of desire:
Unfortunately the packages get some silly names like V59684-01.zip instead of meaningful names. But with the following command you get some information what is included in the package:unzip -l V59684-01.zip Archive: V59684-01.zip Length Date Time Name --------- ---------- ----- ---- 2958631 2014-11-04 13:29 meb-3.11.1-linux-glibc2.5-x86-64bit.tar.gz 185 2014-11-05 08:30 meb-3.11.1-linux-glibc2.5-x86-64bit.tar.gz.asc 77 2014-11-04 13:29 meb-3.11.1-linux-glibc2.5-x86-64bit.tar.gz.md5 2130 2014-11-05 15:06 README.txt --------- ------- 2961023 4 files
Have fun trying the MySQL Enterprise Features. If you need any help installing or integrating them into your infrastructure, do not hesitate to contact FromDual.
Sometimes we face the situation where we have a full MySQL database backup done with mysqldump and then we have to restore and recover just one single table out of our huge mysqldump file.
Further our mysqldump backup was taken hours ago so we want to recover all the changes on that table since our backup was taken up to the end.
In this blog article we cover all the steps needed to achieve this goal for MySQL and MariaDB.
Recommendation: It is recommended to do theses steps on a testing system and then dump and restore your table back to the production system. If you do it directly on your production system you have to know exactly what you are doing...
Further this process should be tested carefully and regularly to get familiar with it and to assure your backup/restore/recovery procedure works properly.
The table we want to recover is called test.test from our backup full_dump.sql.gz. As a first step we have to do the recovery with the following command to our test database:shell> zcat full_dump.sql.gz | extract_table.py --database=test --table=test | mysql -u root
The script extract_table.py is part of the FromDual Recovery Manager to extract one single table from a mysqldump backup.
As a next step we have to extract the binary log file and its position where to start recovery from out of our dump:shell> zcat full_dump.sql.gz | head -n 25 | grep CHANGE CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000216', MASTER_LOG_POS=1300976;
Then we have to find out where we want to stop our Point-in-Time-Recovery. The need for recover is possibly due to a TRUCATE TABLE command or similar operation executed on the wrong system or it is just a time somebody has indicated us to recover to. The position to stop we can find as follows:shell> mysqlbinlog -v mysql-bin.000216 | grep -B5 TRUNCATE --color #150123 19:53:14 server id 35622 end_log_pos 1302950 CRC32 0x24471494 Xid = 3803 COMMIT/*!*/; # at 1302950 #150123 19:53:14 server id 35622 end_log_pos 1303036 CRC32 0xf9ac63a6 Query thread_id=54 exec_time=0 error_code=0 SET TIMESTAMP=1422039194/*!*/; TRUNCATE TABLE test
And as a last step we have to apply all the changes from the binary log to our testing database:shell> mysqlbinlog --disable-log-bin --database=test --start-position=1300976 --stop-position=1302950 mysql-bin.000216 | mysql -u root --force
Now the table test.test is recovered to the wanted point in time and we can dump and restore it to its final location back to the production database.shell> mysqldump --user=root --host=testing test test | mysql --user=root --host=production test
This process has been tested on MySQL 5.1.73, 5.5.38, 5.6.22 and 5.7.5 and also on MariaDB 10.0.10 and 10.1.0.
We recently run into some troubles with max_allowed_packet size problems during backups with the FromDual Backup/Recovery Manager and thus I investigated a bit more in the symptoms of such problems.
Read more about: max_allowed_packet.
A general rule for max_allowed_packet size to avoid problems is: All clients and the server should have set the same value for max_allowed_packet size!
I prepared some data for the test which looked as follows:mysql> SELECT id, LEFT(data, 30), LENGTH(data), ts FROM test; +----+--------------------------------+--------------+------+ | id | left(data, 30) | length(data) | ts | +----+--------------------------------+--------------+------+ | 1 | Anhang | 6 | NULL | | 2 | Anhang | 6 | NULL | | 3 | Anhangblablablablablablablabla | 2400006 | NULL | | 4 | Anhang | 6 | NULL | +----+--------------------------------+--------------+------+
Max_packet_size was set to a too small value then:mysql> SHOW GLOBAL VARIABLES WHERE variable_name = 'max_allowed_packet'; +--------------------+---------+ | Variable_name | Value | +--------------------+---------+ | max_allowed_packet | 1048576 | +--------------------+---------+
The first test was to retrieve the too big row:mysql> SELECT * FROM test WHERE id = 3; ERROR 2020 (HY000): Got packet bigger than 'max_allowed_packet' bytes mysql> SELECT CURRENT_USER(); ERROR 2006 (HY000): MySQL server has gone away No connection. Trying to reconnect... Connection id: 6 Current database: test
We got an error message AND we were disconnected from the server. This is indicated with the message MySQL server has gone away which is basically wrong. We were disconnected and not the server has died or similar in this case.
A further symptom is that we get an entry in the MySQL error log about this incident:[Warning] Aborted connection 3 to db: 'test' user: 'root' host: 'localhost' (Got an error writing communication packets)
So watching carefully such error messages in your MySQL error log with the script check_error_log_mysql.pl from our Nagios/Icinga plugins would be a good idea...
The mysqldump utility basically does the same as a SELECT command so I tried this out and got the same error:shell> mysqldump -u root test > /tmp/test_dump.sql mysqldump: Error 2020: Got packet bigger than 'max_allowed_packet' bytes when dumping table `test` at row: 2
And again we get an error message in the error log! This is also a good indicator to see if your backup, made with mysqldump failed in this case.
To get a proper dump we have to configure the mysqldump utility properly:shell> mysqldump --max-allowed-packet=5000000 -u root test > /tmp/test_dump.sql
After the backup we tried to restore the data:shell> mysql -u root test < /tmp/test_dump.sql ERROR 2006 (HY000) at line 40: MySQL server has gone away
Again we got an error on the command line and in the MySQL error log:[Warning] Aborted connection 11 to db: 'test' user: 'root' host: 'localhost' (Got a packet bigger than 'max_allowed_packet' bytes)
and further the data are only partially loaded:mysql> SELECT * FROM test; +----+--------+------+ | id | data | ts | +----+--------+------+ | 1 | Angang | NULL | | 2 | Angang | NULL | +----+--------+------+
Another symptom we can see here is that the MySQL status aborted_clients is increased in all 3 situation:mysql> SHOW GLOBAL STATUS WHERE variable_name = 'aborted_clients'; +-----------------+-------+ | Variable_name | Value | +-----------------+-------+ | Aborted_clients | 10 | +-----------------+-------+
One positive aspect is that with MySQL 5.7.5 the first 2 symptoms do not appear any more...Further information you can find here: Communication Errors and Aborted Connections.
For processing SELECT queries MySQL needs some times the help of temporary tables. These temporary tables can be created either in memory or on disk.
The number of creations of such temporary tables can be found with the following command:mysql> SHOW GLOBAL STATUS LIKE 'created_tmp%tables'; +-------------------------+-------+ | Variable_name | Value | +-------------------------+-------+ | Created_tmp_disk_tables | 4 | | Created_tmp_tables | 36 | +-------------------------+-------+
There are 2 different reasons why MySQL is creating a temporary disk table instead of a temporary memory table:
- The result is bigger than the smaller one of the MySQL variables max_heap_table_size and tmp_table_size.
- The result contains columns of type BLOB or TEXT.
This method can be used if changing the table structure from TEXT to VARCHAR or the use of a RAM disk are not possible solutions.
After properly installing and testing a Galera Cluster we see that the set-up is not finished yet. It needs something in front of the Galera Cluster that balances the load over all nodes.
So we install a load balancer in front of the Galera Cluster. Typically nowadays HAProxy is chosen for this purpose. But then we find, that the whole Galera Cluster is still not high available in case the load balancer fails or dies. So we need a second load balancer for high availability.
But how should we properly fail-over when the HAProxy load balancer dies? For this purpose we put a Virtual IP (VIP) in front of the HAProxy load balancer pair. The Virtual IP is controlled and fail-overed with Keepalived.
First some preparations: For installing socat we need the repoforge repository:shell> cd /tmp shell> wget http://pkgs.repoforge.org/rpmforge-release/rpmforge-release-0.5.3-1.el6.rf.x86_64.rpm shell> yum localinstall rpmforge-release-0.5.3-1.el6.rf.x86_64.rpm shell> yum update shell> yum install socat
Then we can start installing HAProxy and Keepalived:shell> yum install haproxy keepalived shell> chkconfig haproxy on shell> chkconfig keepalived on
We can check the installed HAProxy and Keepalived versions as follows:shell> haproxy -v HA-Proxy version 1.5.2 2014/07/12 shell> keepalived --version Keepalived v1.2.13 (10/15,2014)
Configuration of HAProxy
More details you can find in the HAProxy documentation.shell> cp /etc/haproxy/haproxy.cfg /etc/haproxy/haproxy.cfg.bak shell> cat << _EOF >/etc/haproxy/haproxy.cfg # # /etc/haproxy/haproxy.cfg # #--------------------------------------------------------------------- # Global settings #--------------------------------------------------------------------- global # to have these messages end up in /var/log/haproxy.log you will # need to: # # 1) configure syslog to accept network log events. This is done # by adding the '-r' option to the SYSLOGD_OPTIONS in # /etc/sysconfig/syslog # # 2) configure local2 events to go to the /var/log/haproxy.log # file. A line like the following can be added to # /etc/sysconfig/syslog # # local2.* /var/log/haproxy.log # log 127.0.0.1 local2 chroot /var/lib/haproxy pidfile /var/run/haproxy.pid maxconn 1020 # See also: ulimit -n user haproxy group haproxy daemon # turn on stats unix socket stats socket /var/lib/haproxy/stats.sock mode 600 level admin stats timeout 2m #--------------------------------------------------------------------- # common defaults that all the 'frontend' and 'backend' sections will # use if not designated in their block #--------------------------------------------------------------------- defaults mode tcp log global option dontlognull option redispatch retries 3 timeout queue 45s timeout connect 5s timeout client 1m timeout server 1m timeout check 10s maxconn 1020 #--------------------------------------------------------------------- # HAProxy statistics backend #--------------------------------------------------------------------- listen haproxy-monitoring *:80 mode http stats enable stats show-legends stats refresh 5s stats uri / stats realm Haproxy\ Statistics stats auth monitor:AdMiN123 stats admin if TRUE frontend haproxy1 # change on 2nd HAProxy bind *:3306 default_backend galera-cluster backend galera-cluster balance roundrobin server nodeA 192.168.1.61:5201 maxconn 151 check server nodeB 192.168.1.61:5202 maxconn 151 check server nodeC 192.168.1.61:5203 maxconn 151 check _EOF
Starting and testing HAProxy
The HAProxy can be started as follows:shell> service haproxy start
and then be checked either over the socket:shell> socat /var/lib/haproxy/stats.sock readline prompt > show info > show stat > help
or over your favourite web browser entering the username and password (monitor:AdMiN123) specified in the configuration file above:
To check the application over the load balancer we can run the following command:shell> mysql --user=app --password=secret --host=192.168.1.38 --port=3306 --exec="SELECT @@wsrep_node_name;" +-------------------+ | @@wsrep_node_name | +-------------------+ | Node C | +-------------------+ shell> mysql --user=app --password=secret --host=192.168.1.38 --port=3306 --exec="SELECT @@wsrep_node_name;" +-------------------+ | @@wsrep_node_name | +-------------------+ | Node A | +-------------------+ shell> mysql --user=app --password=secret --host=192.168.1.38 --port=3306 --exec="SELECT @@wsrep_node_name;" +-------------------+ | @@wsrep_node_name | +-------------------+ | Node B | +-------------------+
Configuration a Virtual IP (VIP) with Keepalived
Now we have 2 HAProxy load balancers. But what happens if one of them fails. Then we do not want to reconfigure our application to work properly again. The fail-over should happen automatically. For this we need a Virtual IP which should automatically fail-over.
Starting and testing Keepalived
To test the keepalived we can run the following command:shell> keepalived -f /etc/keepalived/keepalived.conf --dont-fork --log-console --log-detail ^C
To finally start it the following command will serve:shell> service keepalived start
To check the Virtual IP the following command will help:shell> ip addr show eth1
And then we can check our application over the VIP:shell> mysql --user=app --password=secret --host=192.168.1.99 --port=3306 --exec="SELECT @@wsrep_node_name;"
- Brian Seitzer: Redundant Load Balancers – HAProxy and Keepalived
We have recently seen a case where the following command was executed on a Galera Cluster node:SQL> GRANT SUPER ON userdb.* TO email@example.com; ERROR 1221 (HY000): Incorrect usage of DB GRANT and GLOBAL PRIVILEGES
2014-12-09 14:53:55 7457 [Warning] Did not write failed 'GRANT SUPER ON userdb.* TO firstname.lastname@example.org' into binary log while granting/revoking privileges in databases. 2014-12-09 14:53:55 7457 [ERROR] Slave SQL: Error 'Incorrect usage of DB GRANT and GLOBAL PRIVILEGES' on query. Default database: ''. Query: 'GRANT SUPER ON userdb.* TO email@example.com', Error_code: 1221 2014-12-09 14:53:55 7457 [Warning] WSREP: RBR event 1 Query apply warning: 1, 17 2014-12-09 14:53:55 7457 [Warning] WSREP: Ignoring error for TO isolated action: source: c5e54ef5-7faa-11e4-97b0-5e5c695f08a5 version: 3 local: 0 state: APPLYING flags: 65 conn_id: 4 trx_id: -1 seqnos (l: 4, g: 17, s: 15, d: 15, ts: 113215863294782)
According to the error message it looks like this command is done in Total Order Isolation (TOI) mode during the Rolling Schema Upgrade (RSU).
Only on the nodes which did NOT receive this wrong command the error log message was written and further they have received a GRA_*.log file.
Analysis of the GRA_*.log (failed transactions) files:hexdump -C GRA_2_16.log 00000000 f3 fe 86 54 02 53 14 00 00 76 00 00 00 76 00 00 |...T.S...v...v..| 00000010 00 00 00 04 00 00 00 00 00 00 00 00 00 00 2a 00 |..............*.| 00000020 00 00 00 00 00 01 00 00 00 40 00 00 00 00 06 03 |.........@......| 00000030 73 74 64 04 21 00 21 00 08 00 0b 04 72 6f 6f 74 |std.!.!.....root| 00000040 09 6c 6f 63 61 6c 68 6f 73 74 00 67 72 61 6e 74 |.localhost.grant| 00000050 20 53 55 50 45 52 20 6f 6e 20 75 73 65 72 64 62 | SUPER on userdb| 00000060 2e 2a 20 74 6f 20 72 6f 6f 74 40 31 32 37 2e 30 |.* to firstname.lastname@example.org| 00000070 2e 30 2e 31 31 31 |.0.111 |
dd if=bin-log.000001 of=binlog.header bs=1 count=120 cat binlog.header GRA_2_17.log > GRA_2_17.binlog_events mysqlbinlog GRA_2_17.binlog_events ... # at 120 #141209 15:04:54 server id 5201 end_log_pos 118 CRC32 0x3432312e Query thread_id=45 exec_time=0 error_code=0 SET TIMESTAMP=1418133894/*!*/; SET @@session.pseudo_thread_id=4/*!*/; SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/; SET @@session.sql_mode=1073741824/*!*/; SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/; /*!\C utf8 *//*!*/; SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=8/*!*/; SET @@session.lc_time_names=0/*!*/; SET @@session.collation_database=DEFAULT/*!*/; grant SUPER on userdb.* to email@example.com /*!*/; DELIMITER ; # End of log file ROLLBACK /* added by mysqlbinlog */; /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/; /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;
It further looks like this command was issues by Connection ID number 4: conn_id: 4.
Precaution: Before you try this out on your production system do a BACKUP first! FromDual Backup Manager can help you with this.Situation
A MySQL user has delete its InnoDB table files for example like this:shell> rm -f $datadir/test/test.* Analysis
We do some analysis first:mysql> DROP TABLE test; ERROR 1051 (42S02): Unknown table 'test' mysql> CREATE TABLE test (id INT) ENGINE = InnoDB; ERROR 1050 (42S01): Table '`test`.`test`' already exists
The MySQL error log shows us the following information:141022 17:09:04 InnoDB: Operating system error number 2 in a file operation. InnoDB: The error means the system cannot find the path specified. InnoDB: If you are installing InnoDB, remember that you must create InnoDB: directories yourself, InnoDB does not create them. 141022 17:09:04 InnoDB: Error: trying to open a table, but could not InnoDB: open the tablespace file './test/test.ibd'! InnoDB: Have you moved InnoDB .ibd files around without using the InnoDB: commands DISCARD TABLESPACE and IMPORT TABLESPACE? InnoDB: It is also possible that this is a temporary table #sql..., InnoDB: and MySQL removed the .ibd file for this. InnoDB: Please refer to InnoDB: http://dev.mysql.com/doc/refman/5.5/en/innodb-troubleshooting-datadict.html InnoDB: for how to resolve the issue.
User claims that he does NOT need the table and/or the data any more but wants to get rid of the error messages and/or create a new table with the same name.mysql> CREATE SCHEMA recovery; mysql> use recovery mysql> CREATE TABLE test (id INT) ENGINE = InnoDB; mysql> \! cp $datadir/recovery/test.frm $datadir/test/ mysql> DROP SCHEMA recovery; mysql> use test mysql> DROP TABLE test; Prove
To prove it works we create a new table and fill in some records:mysql> CREATE TABLE test (id int UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, data VARCHAR(64), ts TIMESTAMP) ENGINE = InnoDB; mysql> INSERT INTO test VALUES (NULL, 'Test data', NULL); Literature
This week we did some migrations from MariaDB 10.0 to Percona Server 5.6 at the IT department of a big German bank.
We were perfectly aware that since version 10.0 the MariaDB code base started to diverge slightly away from the MySQL and Percona Server code base which are still pretty close to each other.
Because of the Percona Server option enforce_storage_engine we wanted to do this migration.
We stopped the MariaDB 10.0 server replaced the binaries by the Percona Server 5.6 binaries and started the Percona 5.6 server again. After successfully starting the instance we found some error messages in the MySQL error log. By running the mysql_upgrade command some of the problems were fixed but not all of them. Still left problems were:
- The MariaDB binary logs provoked some error messages for the Percona Server: [ERROR] Error in Log_event::read_log_event(): 'Found invalid event in binary log', data_len: 25, event_type: -93 [Warning] Error reading GTIDs from binary log: -1 [ERROR] Incorrect definition of table mysql.db: expected column 'User' at position 2 to have type char(16), found type char(80). [ERROR] Incorrect definition of table mysql.event: expected column 'definer' at position 3 to have type char(77), found type char(141). [ERROR] Incorrect definition of table mysql.event: expected column 'sql_mode' at position 14 to have type set... A purge of the binary logs solved this issue.
- The tables mysql.event, mysql.innodb_table_stats and mysql.innodb_index_stats where not fixed by mysql_upgrade (a bug to fix for Percona and MySQL/Oracle?). We had to replace those tables manually by copying from an other already working Percona 5.6 Server.
Later in the FromDual technology labs we investigated further and tried the other way from Percona Server 5.6 to MariaDB 10.0. In this direction we found some other errors in the MySQL error log which also where not completely resolved by the mysql_upgrade utility:
- The mysql.innodb_table_stats and mysql.innodb_index_stats tables where recreated manually (here a bug to fix for the MariaDB people?).
- All error messages from tables affected by the following message: InnoDB: in InnoDB data dictionary has unknown flags 40/50/52. could be silenced by a run of the OPTIMIZE TABLE command (which can become quite expensive for very big tables).
Sidegrades from MySQL 5.6 to Percona Server 5.6 and back did not provoke any error message written to the MySQL log files. Sidegrades from MariaDB 10.0 to MySQL 5.6 and vice versa behaved exactly the same as MariaDB 10.0 to Percona Server 5.6 and back.from/to: MySQL 5.6 MariaDB 10.0 Percona Server 5.6 MySQL 5.6 - 2 tables, OPTIMIZE OK MariaDB 10.0 binlog, 3 tables - binlog, 3 tables Percona Server 5.6 OK 2 tables, OPTIMIZE -
During our tests we got rid of the error messages. If they caused any technical harm to the tables or the data we cannot say so far. Further testing and experience from real life is needed. Any feedback is welcome!Observations
It looks like MariaDB 10.0 understands MySQL/Percona Server replication but not the other way around. So replication from MariaDB 10.0 to MySQL 5.6 does probably not work (different implementation of GTID)?Recommendation
To make sure a sigdegrade between these 3 MySQL branches/forks is seamlessly possible the best method seems to be to dump/restore (NOT xtrabackup!) the data. This can be an issue with huge databases (hundreds of Gbyte).Further aid
If you need any help to convert MySQL to MariaDB to Percona Server or the other way do not hesitate to contact the FromDual consultancy team. We will be pleased to assist you as a neutral and vendor independent consulting company.
Sometimes it could be desirable to replicate from a Galera Cluster to a single MySQL slave or to an other Galera Cluster. Reasons for this measure could be:
- An unstable network between two Galera Cluster locations.
- A separation of a reporting slave and the Galera Cluster so that heavy reports on the slave do not affect the Galera Cluster performance.
- Mixing different sources in a slave or a Galera Cluster (fan-in replication).
This article is based on earlier research work (see MySQL Cluster - Cluster circular replication with 2 replication channels) and uses the old MySQL replication style (without MySQL GTID).Preconditions
- Enable the binary logs on 2 nodes of a Galera Cluster (we call them channel masters) with the log_bin variable.
- Set log_slave_updates = 1 on ALL Galera nodes.
- It is recommended to have small binary logs and relay logs in such a situation to reduce overhead of scanning the files (max_binlog_size = 100M).
Let us assume that for some reason the actual channel master of channel 1 breaks. As a consequence the slave of channel 1 does not receive any replication events any more. But we have to keep the replication stream up and running. So we have to switch the replication channel to channel master 2.Switching replication channel
First for security reasons we should stop the slave of replication channel 1 first:mysql> STOP SLAVE;
Then we have to find the actual relay log on the slave:mysql> pager grep Relay_Log_File mysql> SHOW SLAVE STATUS\G mysql> nopager Relay_Log_File: slave-relay-bin.000019
Next we have to find the last applied transaction on the slave:mysql> SHOW RELAYLOG EVENTS IN 'slave-relay-bin.000019'; | slave-relay-bin.000019 | 3386717 | Query | 5201 | 53745015 | BEGIN | | slave-relay-bin.000019 | 3386794 | Table_map | 5201 | 53745067 | table_id: 72 (test.test) | | slave-relay-bin.000019 | 3386846 | Write_rows | 5201 | 53745142 | table_id: 72 flags: STMT_END_F | | slave-relay-bin.000019 | 3386921 | Xid | 5201 | 53745173 | COMMIT /* xid=1457451 */ | +------------------------+---------+-------------+-----------+-------------+--------------------------------+
This is transaction 1457451 which is the same on all Galera nodes.
On the new channel master of channel 2 we have to find now the matching binary log. This can be done best by matching times between the relay log and the binary log of master of channel 2.
On slave:shell> ll *relay-bin* -rw-rw---- 1 mysql mysql 336 Mai 22 20:32 slave-relay-bin.000018 -rw-rw---- 1 mysql mysql 3387029 Mai 22 20:37 slave-relay-bin.000019
On master of channel 2:shell> ll *bin-log* -rw-rw---- 1 mysql mysql 2518737 Mai 22 19:57 bin-log.000072 -rw-rw---- 1 mysql mysql 143 Mai 22 19:57 bin-log.000073 -rw-rw---- 1 mysql mysql 165 Mai 22 20:01 bin-log.000074 -rw-rw---- 1 mysql mysql 62953648 Mai 22 20:40 bin-log.000075
It looks like binary log 75 of master 2 matches to relay log of our slave.
Now we have to find the same transaction on the master of channel 2:mysql> pager grep -B 6 1457451 mysql> SHOW BINLOG EVENTS IN 'bin-log.000075'; mysql> nopager | bin-log.000075 | 53744832 | Write_rows | 5201 | 53744907 | table_id: 72 flags: STMT_END_F | | bin-log.000075 | 53744907 | Xid | 5201 | 53744938 | COMMIT /* xid=1457450 */ | | bin-log.000075 | 53744938 | Query | 5201 | 53745015 | BEGIN | | bin-log.000075 | 53745015 | Table_map | 5201 | 53745067 | table_id: 72 (test.test) | | bin-log.000075 | 53745067 | Write_rows | 5201 | 53745142 | table_id: 72 flags: STMT_END_F | | bin-log.000075 | 53745142 | Xid | 5201 | 53745173 | COMMIT /* xid=1457451 */ | +----------------+----------+-------------+-----------+-------------+---------------------------------------+
We successfully found the transaction and want the position of the next transaction 53745173 where we should continue replicating.
As a last step we have to set the slave to the master of replication channel 2:mysql> CHANGE MASTER TO master_host='master2', master_port=3306, master_log_file='bin-log.000075', master_log_pos=53745173; mysql> START SLAVE;
After a while the slave has caught up and is ready for the next fail-over back.Discussion
We found during our experiments that an IST of a channel master does not lead to a gap or loss of events in the replication stream. So restarting a channel master does not require a channel fail-over as long as an IST can be used for resyncing the channel master with the Galera Cluster.
The increase of wsrep_cluster_conf_id is NOT an indication that a channel fail-over is required.
A SST resets the binary logs so after the SST a slave will not replicate any more. So using this method should be safe to use. If you find any situation where you experience troubles with channel fail-over please let us know.
The MySQL Backup Manager (mysql_bman) is a wrapper script for standard MySQL backup tools. The Problem with MySQL backup tools is, that they have many options and thus are overcomplicated and errors are easy made.
mysql_bman has the intention to make backups for MySQL easier and technically correct. This means it should per default not allow non-consistent backups or complain if some functions or parameters are used in the wrong way to guarantee proper backups.
In addition it has added some nice features which are missing in standard MySQL backup tools or which are only known from Enterprise backup solutions.Where to download mysql_bman
The Backup Manager for MySQL (mysql_bman) can be downloaded from our website.What mysql_bman user say about
Mathias Brem DBA@DBAOnline on LinkedIn:
mysql backup manager is a very nice tool! Congratulations for FromDual! I made a shell script for catalog and maintained backups by xtrabackup, but mysql_bman is the best!
Xtrabackup + mysql_bman!!!!
The intention of mysql_bman is to assist you in bigger MySQL set-ups where you have to follow some backup policies and where you need a serious backup concept.mysql_bman example
To give you an impression of the power of the MySQL Backup Manager let us have a look at a little example:shell> mysql_bman --target=bman:firstname.lastname@example.org --type=full --mode=logical --policy=daily \ --no-compress --backupdir=/mnt/slowdisk \ --archive --archivedir=/mnt/nfsmount
With this backup method we do a logical full backup (mysqldump is triggered in the background). The backup is stored in the location for backups with the daily policy and is NOT compressed to speed up the backup by saving CPU power AND because the backup device is a de-duplicating drive. Then the backup is archived to and NFS mount.Backup types
To achieve this we have defined different backup types:TypeDescriptionfullfull logical backup (mysqldump) of all schemasbinlogbinary-log backupconfigconfiguration file backup (my.cnf)structurestructure backupcleanupclean-up of backup pieces older than n daysschemabackup of one or more schemasprivilegeprivilege dump (SHOW GRANTS FOR)
A backup type is specified with the option --type=<backup_type>.Backup modes
A backup can either be logical or physical. A logical backup is typically what you do with mysqldump. A physical backup is typically a physical file copy without looking into the data. That is what for example xtrabackup does.
The backup mode is specified with the option --mode=<backup_mode>. The following backup modes are available:ModeDescriptionlogicaldo a logical backup (mysqldump).physicaldo a physical backup (mysqlbackup/innobackup/xtrabackup)Backup policies
Further we have introduced different backup policies. Policies are there to distinguish how different backups should be treated.
The following backup policies exist:PolicyDescriptiondailydirectory to store daily backupsweeklydirectory to store weekly backupsmonthlydirectory to store monthly backupsquarterlydirectory to store quarterly backupsyearlydirectory to store yearly backups
For example you could plan to do a daily MySQL backup with binary logs with a retention policy of 7 days. But once a week you want to do a weekly backup consisting of a full backup, a configuration backup and a structure dump. But this weekly backup you want to keep for 6 months. And because of legal reasons you want to do a yearly backup with a retention policy of 10 years.
A backup policy is specified with the --policy=<backup_policy> option. This leads us to the retention time:Options
The retention time which should be applied to a specific backup policy you can specify with the option --retention=<period_in_days>. The retention option means that a backup is not deleted before this amount of days when you run a clean-up job with mysql_bman.
Let us do an example:shell> mysql_bman --type=cleanup --policy=daily --retention=30
This means that all backups in the daily policy should be deleted when they are older than 30 days.Target
With the --target option you specify the connect string to the database to backup. This database can be located either local (all backup types can be used) or remote (only client/server backup types can be used).
A target looks as follows: user/password@host:port (similar to URI specification) whereas you can omit password and port.Backup location, archiving, compressing and clean-up
The --backupdir option is to control location of the backup files. The policy folders are automatically created under this --backupdir location.
If you have a second layer of backup stores (e.g. tapes or slow backup drives or deduplicated drives or NFS drives) you can use the --archive option to copy your backup files to this second layer storage which is specified with the --archivedir option. For restore performance reasons it is recommended to always keep one or two generations of backups on you fast local drive. If you want to remove the backuped files from the --backupdir destination after the archive job use the --cleanup option.
If you want to omit to compress backups, either to safe time or because your location uses deduplicated drives you can use the --no-compress option.
Especially for hosting companies a full database backup is typically not the right backup strategy because a restore of one specific customer (= schema) is very complicated. For this case we have the --per-schema option. mysql_bman will do a backup of the whole database schema by schema. Keep in mind: This breaks consistency among schemas!
Sometimes you want to do a schema backup only for some specific schemas for this you can use the --schema option. This option allows you to specify schemas to backup or not to backup. --schema=+a,+b means backup schema a and b. --schema=-a,-b means backup all schemas except a and b.
The second variant is less error prone because you do not forget to backup a new database.
MySQL does not know the concept of naming an instance (mysqld). But for bigger environments it could be useful to uniquely name each instance. For this purpose we have introduced the option --instance-name=<give_it_a_name>. This instance name should be unique within your whole company. But we do not enforce it atm. The instance name is used to name backup files and later to identify the backup history of an instance in our backup catalog and to allow us to track the files for restore.mysql_bman configuration file
Specifying everything on the command line is cumbersome. Thus mysql_bman considers a configuration file specified with the --config=<config_file> option.
A mysql_bman configuration file looks for example as follows:
Simulate what happens
For the Sissies among us (as for example me) we have the --simulate option. This option simulates nearly all steps as far as possible without executing really anything. This option is either for testing some features or for debugging purposes.Logging
If you want to track your backup history you can specify with the --log option where your mysql_bman log file should be located.Using Catalog
It will be very useful when you can store your backups metadata in the database so you can check them in the future and to find out the backup criteria (type, mode, instance-name, ... etc) for specific backup processes. This could be achieved by using the catalog feature.
To activate this feature you have to create a database for the catalog "default name is bman_catalog" then create its tables by using the option --create in a special mysql_bman command (check examples below).
Finally, to store your backup metadata in the catalog what you only have to do is adding the option --catalog=catalog_connection_string to the normal mysql_bman command.
Check the examples below for using catalog in mysql_bman.
A little more help you can get with the following command:shell> mysql_bman --help
Do a full (logical = default) backup and store it in the daily policy folder:shell> mysql_bman --email@example.com:3306 --type=full --policy=daily
Do a full physical backup and store it in the weekly policy folder:shell> mysql_bman --firstname.lastname@example.org --type=full --mode=physical --policy=weekly
Do a binary-log backup omitting the password in the target and store it in the daily policy folder:shell> mysql_bman --email@example.com:3307 --type=binlog --policy=daily
Do a MySQL configuration backup and store it in the weekly policy folder:shell> mysql_bman --firstname.lastname@example.org:3306 --type=config --policy=weekly
Do a structure backup and store it in the monthly policy folder and name the file with the instance name:shell> mysql_bman --email@example.com:3306 --type=structure --policy=monthly --instance-name=prod-db
Do a weekly structure backup and archive it to an other backup location:shell> mysql_bman --firstname.lastname@example.org:3306 --type=structure --policy=weekly --archive --archivedir=/mnt/tape
Do a schema backup omitting the mysql schema:shell> mysql_bman --email@example.com:3306 --type=schema --schema=-mysql --policy=daily --archive --archivedir=/mnt/tape
Do a schema backup only of foodmart and world and write it to their own files. Omit compressing these backups because they are located for example on deduplicated drives:shell> mysql_bman --firstname.lastname@example.org:3306 --type=schema --schema=+foodmart,+world --per-schema --policy=daily --no-compress
Creation of a backup catalog (assuming you have created already a catalog database with the default name "bman_catalog"):shell> mysql_bman --email@example.com:3306 --create
Backups against catalog:shell> mysql_bman --firstname.lastname@example.org:3306 --email@example.com:3306 --instance-name=test --type=full --policy=daily
Privilege backup:shell> mysql_bman --firstname.lastname@example.org:3306 --type=privilege --policy=daily --mode=logical
As suggested by morgo I did a little test for the same query and the same data-set mentioned in Impact of column types on MySQL JOIN performance but looking into an other dimension: the time (aka MySQL versions).The answer
To make it short. As a good consultant the answer must be: "It depends!" :-)The test
The query was again the following:SELECT * FROM a JOIN b ON b.a_id = a.id WHERE a.id BETWEEN 10000 AND 15000 ;
The Query Execution Plan was the same for all tested releases.
The relevant MySQL variables where used as follows where possible. Should I have considered join buffer, or any other of those local per session buffers (read_buffer_size, read_rnd_buffer_size, join_buffer_size)?innodb_buffer_pool_size = 768M innodb_buffer_pool_instances = 1 innodb_file_per_table = 1
The results mysql-4.0.30mysql-4.1.25mysql-5.0.96mysql-5.1.73mysql-5.5.35mysql-5.6.15mysql-5.7.3AVG40.8638.683.714.694.647.226.05MEDIAN41.0738.133.694.464.656.326.05STDEV1.512.260.060.340.032.210.03MIN39.2736.993.674.404.596.266.02MAX44.1144.453.865.234.6713.166.10COUNT10.0010.0010.0010.0010.0010.0010.00
galera-5.5.33-23.7.6 / 2.7AVG4.31MEDIAN3.98STDEV1.18MIN3.76MAX8.54COUNT30.00
The Graph Conclusion
Do not trust benchmarks. They are mostly worthless for your specific workload and pure marketing buzz... Including the one above! ;-)
Database vendors (Oracle/MySQL, Percona, MariaDB) are primarily focussing on throughput and features. In general this is at the costs of single query performance.
MySQL users like Facebook, LinkedIn, Google, Wikpedia, Booking.com, Yahoo! etc. are more interested in throughput than single query performance (so I assume). But most of the MySQL users (95%) do not have a troughput problem but a single query performance problem (I assume here that this is true also for Oracle, MS-SQL Server, DB2, PostgreSQL, etc.).
So database vendors are not primarily producing for the masses but for some specific users/customers (which possibly pay a hell of money for this).
Back to the data:
My first hypothesis: "The old times were always better" is definitely not true. MySQL 4.0 and 4.1 sucked with this specific query. But since MySQL 5.0 the rough trend is: single query performance becomes worse over time (newer versions). I assume this also true for other databases...
Some claims like: "We have the fastest MySQL" or "We have hired the whole optimizer team" does not necessary reflect in better single query performance. At least not for this specific query.
So in short: If you upgrade or side-grade (MySQL <-> Percona <-> MariaDB), test always very carefully! It is not predictable where the traps are. Newer MySQL release can increase performance of your application or not. Do not trust marketing buzz!Artefacts
Some artefacts we have already found during this tiny test:
- In MySQL 5.0 an optimization was introduced (not in the Optimizer!?!) to speed up this specific query dramatically.
- MariaDB 5.2 and 5.3 were bad for this specific query.
- I have no clue why Galera Cluster has shown the best results for 5.5. It is no intention or manipulation! It is poor luck. But I like it! :-)
- MySQL 5.6 seems to have some problems with this query. To much improvement done by Oracle/MySQL?
- Percona 5.6 sometimes behaves much better with this query than normal MySQL but from time to time something kicks in which makes Percona dramatically slower. Thus the bad results. I have no clue why. I first though about an external influence. But I was capable to reproduce this behaviour (once). So I assume it must be something Percona internally (AHI for example?).
Do not shoot the messenger!
If you want to reproduce the results most information about are already published. If something is missing please let me know.
Please let me know when you do not agree with the results. So I can expand my universe a bit...
It was fun doing this tests today! And MyEnv was a great assistance doing this kind of tests!
If you want us to do such test for you, please let us know. Our consulting team would be happy to assist you with upgrading or side-grading problems.
In our MySQL trainings and consulting engagements we tell our customers always to use the smallest possible data type to get better query performance. Especially for the JOIN columns. This advice is supported as well by the MySQL documentation in the chapter Optimizing Data Types:
Use the most efficient (smallest) data types possible. MySQL has many specialized types that save disk space and memory. For example, use the smaller integer types if possible to get smaller tables. MEDIUMINT is often a better choice than INT because a MEDIUMINT column uses 25% less space.
I remember somewhere the JOIN columns where explicitly mentioned but I cannot find it any more.Test set-up
To get numbers we have created a little test set-up:CREATE TABLE `a` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT , `data` varchar(64) DEFAULT NULL , `ts` timestamp NOT NULL , PRIMARY KEY (`id`) ) ENGINE=InnoDB CHARSET=latin1
CREATE TABLE `b` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT , `data` varchar(64) DEFAULT NULL , `ts` timestamp NOT NULL , `a_id` int(10) unsigned DEFAULT NULL , PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1
1048576 rows 16777216 rows
The following query was used for the test:EXPLAIN SELECT * FROM a JOIN b ON b.a_id = a.id WHERE a.id BETWEEN 10000 AND 15000; +----+-------------+-------+--------+---------------+---------+---------+-------------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+--------+---------------+---------+---------+-------------+----------+-------------+ | 1 | SIMPLE | b | ALL | NULL | NULL | NULL | NULL | 16322446 | Using where | | 1 | SIMPLE | a | eq_ref | PRIMARY | PRIMARY | 4 | test.b.a_id | 1 | NULL | +----+-------------+-------+--------+---------------+---------+---------+-------------+----------+-------------+
And yes: I know this query could be more optimal by setting an index on b.a_id.Results
The whole workload was executed completely in memory and thus CPU bound (we did not want to measure the speed of our I/O system).SEJOIN columnbytesquery timeGainSpaceCharacter setInnoDBMEDIUMINT35.28 s96%4% faster75%InnoDBINT45.48 s100%100%100%InnoDBBIGINT85.65 s107%7% slower200%InnoDBNUMERIC(7, 2)~46.77 s124%24% slower~100%InnoDBVARCHAR(7)7-86.44 s118%18% slower~200%latin1InnoDBVARCHAR(16)7-86.44 s118%18% slower~200%latin1InnoDBVARCHAR(32)7-86.42 s118%18% slower~200%latin1InnoDBVARCHAR(128)7-86.46 s118%18% slower~200%latin1InnoDBVARCHAR(256)8-96.17 s114%14% slower~225%latin1InnoDBVARCHAR(16)7-86.96 s127%27% slower~200%utf8InnoDBVARCHAR(128)7-86.82 s124%24% slower~200%utf8InnoDBCHAR(16)166.85 s125%25% slower400%latin1InnoDBCHAR(128)1289.68 s177%77% slower3200%latin1InnoDBTEXT8-910.7 s195%95% slower~225%latin1MyISAMINT43.16 s58%42% fasterTokuDBINT44.52 s82%18% faster
Some comments to the tests:
- MySQL 5.6.13 was used for most of the tests.
- TokuDB v7.1.0 was tested with MySQL 5.5.30.
- As results the optimistic cases were taken. In reality the results can be slightly worse.
- We did not take into consideration that bigger data types will eventually cause more I/O which is very slow!
Great News: Galera Cluster v3.1 GA for MySQL 5.6 was released at Percona Live London (PLUK) 2013. The information is still a bit hidden...
You can find it here:
- The Plug-in: https://launchpad.net/galera/3.x/25.3.1
- The MySQL: https://launchpad.net/codership-mysql/5.6/5.6.14-25.1
Or directly on our download page.
Careful: Online-Upgrade from 5.5 to 5.6 will not work yet. We have to find a work-around...
We had a Galera Cluster support case recently. The customer was drenched in tears because his Galera Cluster did not work any more and he could not make it work any more.
Upsss! What has happened?
A bit of the background of this case: The customer wanted to do a rolling-restart of the Galera Cluster under load because of an Operating System upgrade which requires a reboot of the system.
Lets have a look at the MySQL error log to see what was going on. Customer restarted server with NodeC:12:20:42 NodeC: normal shutdown --> Group 2/2 12:20:46 NodeC: shutdown complete 12:22:09 NodeC: started 12:22:15 NodeC: start replication 12:22:16 NodeC: CLOSED -> OPEN 12:22:16 all : Group 2/3 component all PRIMARY 12:22:17 NodeC: Gap in state sequence. Need state transfer. 12:22:18 all : Node 1 (NodeC) requested state transfer from '*any*'. Selected 0 (NodeB)(SYNCED) as donor. 12:22:18 NodeB: Shifting SYNCED -> DONOR/DESYNCED (TO: 660966498) 12:22:19 NodeC: Shifting PRIMARY -> JOINER (TO: 660966498) 12:22:19 NodeC: Receiving IST: 14761 writesets, seqnos 660951493-660966254 12:22:21 NodeC: 0 (NodeB): State transfer to 1 (NodeC) complete.
Everything went fine so far NodeC came up again and did an IST as expected. But then the first operational error happened: The customer did not wait to reboot NodeB until NodeC was completely recovered. It seems like NodeC took some time for the IST recovery. This should be checked on all nodes with SHOW GLOBAL STATUS LIKE 'wsrep%';...12:22:21 NodeC: Member 0 (NodeB) synced with group. 12:22:21 NodeB: Shifting JOINED -> SYNCED (TO: 660966845) 12:22:21 NodeB: Synchronized with group, ready for connections --> NodeC seems not to be ready yet! 12:23:21 NodeB: Normal shutdown 12:23:21 all : Group 1/2 12:23:21 NodeC: Aborted (core dumped)
And now Murphy was acting already the first time: We hit a situation in the Galera Cluster which is not covered as expected. Now we have 2 nodes out of 3 not working. As a result the Cluster gets a quorum loss (non-Primary, more than 50% of nodes disappeared) and does not reply to any SQL queries any more. This is a bug because both nodes left the cluster gracefully. The third node should have stayed primary:12:23:21 NodeB: Received SELF-LEAVE. Closing connection. 12:23:23 NodeB: Shifting CLOSED -> DESTROYED (TO: 660973981) 12:23:25 NodeB: Shutdown complete 12:23:29 NodeC: mysqld_safe WSREP: sleeping 15 seconds before restart 12:23:37 NodeA: Received NON-PRIMARY. 12:23:44 NodeC: mysqld_safe mysqld restarted 12:23:48 NodeC: Shifting CLOSED -> OPEN 12:23:48 NodeC: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 2 12:23:48 NodeC: Received NON-PRIMARY. 12:23:48 NodeA: New COMPONENT: primary = no, bootstrap = no, my_idx = 1, memb_num = 2 12:23:48 NodeA: Received NON-PRIMARY. 12:24:30 NodeB: mysqld_safe Starting mysqld daemon 12:24:36 NodeB: Start replication 12:24:37 NodeB: Received NON-PRIMARY.
As a result the customer decided to shutdown the whole cluster. Which was not necessary but is a acceptable approach:12:27:55 NodeB: /usr/sbin/mysqld: Normal shutdown 12:27:58 NodeB: /usr/sbin/mysqld: Shutdown complete 12:28:14 NodeA: /usr/sbin/mysqld: Normal shutdown 12:28:19 NodeA: /usr/sbin/mysqld: Shutdown complete 12:31:45 NodeC: /usr/sbin/mysqld: Normal shutdown 12:31:49 NodeC: /usr/sbin/mysqld: Shutdown complete
We experience a complete cluster outage now. An then the next operational error happened: The customer has chosen the node (NodeC) with the worst (= oldest) data as the starting node for the new Cluster:12:31:55 NodeC: Starting mysqld daemon 12:31:58 NodeC: PRIMARY, 1/1 12:31:58 NodeC: /usr/sbin/mysqld: ready for connections. 12:33:29 NodeB: mysqld_safe Starting mysqld daemon 12:33:33 NodeB: PRIMARY, 1/2
An alternative approach would have been to run the command SET GLOBAL wsrep_provider_options='pc.bootstrap=yes'; on the node (NodeA) with the most recent data...
After connecting NodeB (with the newer state) requested an state transfer from the older NodeC:
And now Mister Murphy is acting a second time: We hit another situation: The newer node requests an IST from the older node which has progressed in the meanwhile to an even newer state. So the newer joiner node receives data from the older donor node which causes an AUTO_INCREMENT Primary Key violation. As a consequence the node crashes:12:33:36 NodeB: receiving IST failed, node restart required: Failed to apply app buffer: äJR#, seqno: 660974010, status: WSREP_FATAL 12:33:36 NodeB: Closed send monitor. 12:33:37 NodeB: Closing slave action queue. 12:33:37 NodeB: Aborted (core dumped) 12:33:37 NodeC: PRIMARY 1/1 12:33:44 NodeC: Shifting DONOR/DESYNCED -> JOINED (TO: 660983204) 12:33:59 NodeB: mysqld_safe mysqld restarted 12:34:04 NodeB: Shifting CLOSED -> OPEN 12:34:07 NodeB: Aborted (core dumped) ... Loop
This situation keeps the node NodeB now in a crashing loop. Restarted by the mysqld_safe process requesting an IST. This is another bug which is fixed in a newer Galera MySQL (5.5.33). And now the next operational error happened: Instead of killing NodeB and forcing an SST by deleting the grastat.dat file They started the third node as well...12:37:12 NodeA: mysqld_safe Starting mysqld daemon ... --> code dumped ... Loop
NodeB and NodeA both have the same problem now...
As a result: Node NodeA and NodeB are now looping in a crash. But at least the node NodeC was up and running all the time.Learnings
- Most important: Have an ntpd service running on all Cluster nodes to not mess up the times on different nodes while investigating in errors. This makes problem solving much easier...
- In case of split-brain or quorum loss choose the node with the most recent data as your initial Cluster node.
- If you have a Cluster in split-brain you do not have to restart it. You can bring the node out of split-brain with pc.bootstrap=yes variable if you found out which node is the most recent one.
- Analyse error log files carefully to understand what went wrong. Forcing an SST only takes a few seconds.
- Upgrade your software regularly to not hit old known bugs. The rule Do not touch a running system! does not apply here because we are already touching the running system! So regular upgrade from time to time can be very helpful!
- Be familiar with operational issues of your Cluster software. A Cluster does not only mean high-availability. It means also you have to train your staff to handle it.
- It is always valuable to have support for your business critical systems.
In MySQL we have the typical behaviour that we open and close connections very often and rapidly. So we have very short-living connections to the server. This can lead in extreme cases to the situation that the maximum number of TCP ports are exhausted.
The maximum number of TCP ports we can find with:# cat /proc/sys/net/ipv4/ip_local_port_range 32768 61000
In this example we can have in maximum (61000 - 32768 = 28232) connections concurrently open.
When a TCP connections closes the port cannot be reused immediately afterwards because the Operating System has to wait for the duration of the TIME_WAIT interval (maximum segment lifetime, MSL). This we can see with the command:# netstat -nat Active Internet connections (servers and established) Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 0.0.0.0:10050 0.0.0.0:* LISTEN tcp 0 0 0.0.0.0:10051 0.0.0.0:* LISTEN tcp 0 0 127.0.0.1:10051 127.0.0.1:60756 TIME_WAIT tcp 0 0 127.0.0.1:10050 127.0.0.1:50191 TIME_WAIT tcp 0 0 127.0.0.1:10050 127.0.0.1:52186 ESTABLISHED tcp 0 0 127.0.0.1:10051 127.0.0.1:34445 TIME_WAIT
The reason for waiting is that packets may arrive out of order or be retransmitted after the connection has been closed. CLOSE_WAIT indicates that the other side of the connection has closed the connection. TIME_WAIT indicates that this side has closed the connection. The connection is being kept around so that any delayed packets can be matched to the connection and handled appropriately.
The Maximum Segment Lifetime can be found as follows:# cat /proc/sys/net/ipv4/tcp_fin_timeout 60
This basically means your system cannot guarantee more than ((61000 - 32768) / 60 = 470) ports at any given time.Solutions
There are several strategies out of this problem:
- Open less frequently connections to your MySQL database. Put more payload into one connection. Often Connection Pooling is used to achieve this.
- Increasing the port range. Setting the range to 15000 61000 is pretty common these days (extreme tuning: 1024 - 65535).
- Increase the availability by decreasing the FIN timeout.
Those values can be changed online with:# echo 30 > /proc/sys/net/ipv4/tcp_fin_timeout # echo 15000 65000 > /proc/sys/net/ipv4/ip_local_port_range
Or permanently by adding it to /etc/sysctl.conf
An other possibility to change this behaviour is to use tcp_tw_recycle and tcp_tw_reuse. By default they are disabled:# cat /proc/sys/net/ipv4/tcp_tw_recycle 0 # cat /proc/sys/net/ipv4/tcp_tw_reuse 0
These parameters allow fast cycling of sockets in TIME_WAIT state and re-using them. But before you do this change make sure that this does not conflict with the protocols that you would use for the application that needs these ports.
The tcp_tw_recycle could cause some problems when using load balancers:tcp_tw_reuse Allow to reuse TIME_WAIT sockets for new connections when it is safe from protocol viewpoint. Default value is 0.
It should not be changed without advice/request of technical experts. tcp_tw_recycle Enable fast recycling TIME_WAIT sockets. Default value is 0. It should not be changed without advice/request of technical experts.
It took me quite a while to find out how the beast Galera Arbitrator (garbd) works. To safe your time here a short summary:How to start Galera Arbitrator (garbd) shell> ./garbd --address gcomm://192.168.13.1,192.168.13.2 --group "Our Galera Cluster" --log /tmp/garbd.log --daemon
How to stop Galera Arbitrator (gardb) shell> killall garbd
How to start Galera Arbitrator (garbd) with a configuration file shell>./garbd --cfg /tmp/garb.cnf --daemon
The configuration file looks as follows:# # /etc/mysql/garb.cnf # address = gcomm://127.0.0.1:5671,127.0.0.1:5672,127.0.0.1:5673 group = Our Galera Cluster options = gmcast.listen_addr=tcp://127.0.0.1:5674 log = /tmp/garbd.log
A service start/stop script can be found at: galera-src/garb/files/agrb.sh and galera-src/garb/files/garb.cnf