You are here
FromDual TechFeed (en)
One of the most expensive database operations is performing Data Definition Language (DDL, e.g. CREATE, DROP, ALTER, etc.) statements, specially, the ALTER statements because MySQL blocks the entire table for both reads and writes while modifying the table.
For the huge tables, this might take hours to get the table changed which affects the application, so that, a good planning is required for such operations in order to avoid doing these changes during the peak times. For those people who have 24/7 services or limited maintenance window, DDL on huge tables is a really nightmare.
Percona developed a very good tool called pt-online-schema-change (version 2.2.6 at the time of writing this article) to perform such operations online without blocking/affecting the application and read/write operations to the table being changed is available.
Also MySQL made some enhancements for DDL statements and introduced the Online DDL feature in MySQL 5.6.
In this article, I will talk about an overview of both ways (Online DDL & pt-online-schema-change) alongside with an example and which one of them should be used in different scenarios.pt-online-schema-change Overview
This tool is developed by Percona to alter tables without locking them during the ALTER operation.
Simply, this tool creates a new empty table like the original table with the needed structure change, copy the data from the original table in small chunks to the new table, drop the original table and then rename the new table to the original name. During the copy process all new changes to the original table are being applied to the new one because a trigger is created on the original table which ensure that all new changes will be applied on the new table.
For more information about pt-online-schema-change tool, check out the manual documentation.Example
Altering a table called "test.test1" by adding an index (name_idx) on column "name":[root@gcservera ~]# pt-online-schema-change --execute --alter "add index name_idx (name)" D=test,t=test1,h=localhost Operation, tries, wait: copy_rows, 10, 0.25 create_triggers, 10, 1 drop_triggers, 10, 1 swap_tables, 10, 1 update_foreign_keys, 10, 1 Altering `test`.`test1`... Creating new table... Created new table test._test1_new OK. Altering new table... Altered `test`.`_test1_new` OK. 2014-02-09T15:33:27 Creating triggers... 2014-02-09T15:33:27 Created triggers OK. 2014-02-09T15:33:27 Copying approximately 1 rows... 2014-02-09T15:33:27 Copied rows OK. 2014-02-09T15:33:27 Swapping tables... 2014-02-09T15:33:27 Swapped original and new tables OK. 2014-02-09T15:33:27 Dropping old table... 2014-02-09T15:33:27 Dropped old table `test`.`_test1_old` OK. 2014-02-09T15:33:27 Dropping triggers... 2014-02-09T15:33:27 Dropped triggers OK. Successfully altered `test`.`test1`.
The output is perfectly describing all steps that the tool is doing in the background.Limitations
- A PRIMARY KEY or a unique index should be defined for the table before using this tool because it is required for the DELETE trigger.
- Not supported if the table has already triggers defined.
- The tool become complicate a little if the table has a foreign key constraint and an additional option --alter-foreign-keys-method should be used.
- Also because of the foreign keys, the object names might be changed (indexes names , .. etc).
- In Galera Cluster environment, altering MyISAM tables is not supported and the system variable "wsrep_OSU_method" must be set to "TOI" (total order isolation).
In MySQL 5.5 and 5.1 with the InnoDB plugin, a new feature known as Fast Index Creation was introduced to avoid copying the tables data - when adding or removing secondary indexes - using the optimized CREATE INDEX and DROP INDEX statements.
In MySQL 5.6, the Online DDL method was introduced to allow more changes to be made on the table while accessing and writing to the table being changed is available.
The Online DDL syntax is exactly the same like the normal alter statement after specifying two parameters:ALGORITHM:
- INPLACE: the table change will be made in-place without rebuilding the entire table (in most cases, no copying data to temporary table is required).
- COPY: copying data to a temporary table, rebuilding the table and reconstructing the secondary indexes will be made (equivalent to the traditional method).
- NONE: Read and write operations are allowed during the altering process.
- SHARED: Only read operations are allowed during the altering operations (DML is not allowed).
- EXCLUSIVE: The entire table will be locked for both reading and writing (neither select nor DML are allowed).
This topic is perfectly explained in the online manual documentation, you can check it out here for more information.Example
Altering a table called "test.test2" by adding an index (name_idx) on column "name":mysql> alter table test2 -> add index name_idx (name),algorithm=inplace, lock=none; Query OK, 0 rows affected (0.03 sec) Records: 0 Duplicates: 0 Warnings: 0 Limitations
- Works only with InnoDB (syntax wise it could be used with other storage engines "like MyISAM" but only "algorithm=copy" is allowed which is equivalent to the traditional method).
- Regardless of the locking used (none,shared or exclusive) a brief period at the beginning and at the end of the process is requiring an exclusive lock on the table.
- foreign_key_checks should be disabled when adding/dropping foreign keys to avoid table copying behavior.
- Still some alter operations require data copy to a temp table or table locking in order to make the change (the old behavior), for more details on which table change require table-copying or table locking, check out this manual page.
The following is a comparison results between Online DDL and pt-online-schema-change for some alter operations applied on a table contains 1,078,880 rows:
Online DDLpt-online-schema-changeChange OperationRow(s) affectedIs table locked?Time (sec)Row(s) affectedIs table locked?Time (sec)Add Index0No3.76All rowsNo38.12Drop Index0No0.34All rowsNo36.04Add Column0No27.61All rowsNo37.21Rename Column0No0.06All rowsNo34.16Rename Column + change its data typeAll rowsYes30.21All rowsNo34.23Drop Column0No22.41All rowsNo31.57Change table ENGINEAll rowsYes25.30All rowsNo35.54Which method should be used?
Now the question is, which method should we use to perform alter table statements?
While pt-online-schema-change allows read and write operations to the table being altered, it still copies the tables data to the temporary table in the background which adds overhead on the MySQL server. So basically, we should use pt-online-schema-change only if the Online DDL will not work efficiently. In other words, if the Online DDL will require copying data to a temporary table (algorithm=copy) and the table will be locked for long time (lock=exclusive) then we should use pt-online-schema-change tool.
As suggested by morgo I did a little test for the same query and the same data-set mentioned in Impact of column types on MySQL JOIN performance but looking into an other dimension: the time (aka MySQL versions).The answer
To make it short. As a good consultant the answer must be: "It depends!" :-)The test
The query was again the following:SELECT * FROM a JOIN b ON b.a_id = a.id WHERE a.id BETWEEN 10000 AND 15000 ;
The Query Execution Plan was the same for all tested releases.
The relevant MySQL variables where used as follows where possible. Should I have considered join buffer, or any other of those local per session buffers (read_buffer_size, read_rnd_buffer_size, join_buffer_size)?innodb_buffer_pool_size = 768M innodb_buffer_pool_instances = 1 innodb_file_per_table = 1
The results mysql-4.0.30mysql-4.1.25mysql-5.0.96mysql-5.1.73mysql-5.5.35mysql-5.6.15mysql-5.7.3AVG40.8638.683.714.694.647.226.05MEDIAN41.0738.133.694.464.656.326.05STDEV1.512.260.060.340.032.210.03MIN39.2736.993.674.404.596.266.02MAX44.1144.453.865.234.6713.166.10COUNT10.0010.0010.0010.0010.0010.0010.00
galera-5.5.33-23.7.6 / 2.7AVG4.31MEDIAN3.98STDEV1.18MIN3.76MAX8.54COUNT30.00
The Graph Conclusion
Do not trust benchmarks. They are mostly worthless for your specific workload and pure marketing buzz... Including the one above! ;-)
Database vendors (Oracle/MySQL, Percona, MariaDB) are primarily focussing on throughput and features. In general this is at the costs of single query performance.
MySQL users like Facebook, LinkedIn, Google, Wikpedia, Booking.com, Yahoo! etc. are more interested in throughput than single query performance (so I assume). But most of the MySQL users (95%) do not have a troughput problem but a single query performance problem (I assume here that this is true also for Oracle, MS-SQL Server, DB2, PostgreSQL, etc.).
So database vendors are not primarily producing for the masses but for some specific users/customers (which possibly pay a hell of money for this).
Back to the data:
My first hypothesis: "The old times were always better" is definitely not true. MySQL 4.0 and 4.1 sucked with this specific query. But since MySQL 5.0 the rough trend is: single query performance becomes worse over time (newer versions). I assume this also true for other databases...
Some claims like: "We have the fastest MySQL" or "We have hired the whole optimizer team" does not necessary reflect in better single query performance. At least not for this specific query.
So in short: If you upgrade or side-grade (MySQL <-> Percona <-> MariaDB), test always very carefully! It is not predictable where the traps are. Newer MySQL release can increase performance of your application or not. Do not trust marketing buzz!Artefacts
Some artefacts we have already found during this tiny test:
- In MySQL 5.0 an optimization was introduced (not in the Optimizer!?!) to speed up this specific query dramatically.
- MariaDB 5.2 and 5.3 were bad for this specific query.
- I have no clue why Galera Cluster has shown the best results for 5.5. It is no intention or manipulation! It is poor luck. But I like it! :-)
- MySQL 5.6 seems to have some problems with this query. To much improvement done by Oracle/MySQL?
- Percona 5.6 sometimes behaves much better with this query than normal MySQL but from time to time something kicks in which makes Percona dramatically slower. Thus the bad results. I have no clue why. I first though about an external influence. But I was capable to reproduce this behaviour (once). So I assume it must be something Percona internally (AHI for example?).
Do not shoot the messenger!
If you want to reproduce the results most information about are already published. If something is missing please let me know.
Please let me know when you do not agree with the results. So I can expand my universe a bit...
It was fun doing this tests today! And MyEnv was a great assistance doing this kind of tests!
If you want us to do such test for you, please let us know. Our consulting team would be happy to assist you with upgrading or side-grading problems.
In our MySQL trainings and consulting engagements we tell our customers always to use the smallest possible data type to get better query performance. Especially for the JOIN columns. This advice is supported as well by the MySQL documentation in the chapter Optimizing Data Types:
Use the most efficient (smallest) data types possible. MySQL has many specialized types that save disk space and memory. For example, use the smaller integer types if possible to get smaller tables. MEDIUMINT is often a better choice than INT because a MEDIUMINT column uses 25% less space.
I remember somewhere the JOIN columns where explicitly mentioned but I cannot find it any more.Test set-up
To get numbers we have created a little test set-up:CREATE TABLE `a` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT , `data` varchar(64) DEFAULT NULL , `ts` timestamp NOT NULL , PRIMARY KEY (`id`) ) ENGINE=InnoDB CHARSET=latin1
CREATE TABLE `b` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT , `data` varchar(64) DEFAULT NULL , `ts` timestamp NOT NULL , `a_id` int(10) unsigned DEFAULT NULL , PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1
1048576 rows 16777216 rows
The following query was used for the test:EXPLAIN SELECT * FROM a JOIN b ON b.a_id = a.id WHERE a.id BETWEEN 10000 AND 15000; +----+-------------+-------+--------+---------------+---------+---------+-------------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+--------+---------------+---------+---------+-------------+----------+-------------+ | 1 | SIMPLE | b | ALL | NULL | NULL | NULL | NULL | 16322446 | Using where | | 1 | SIMPLE | a | eq_ref | PRIMARY | PRIMARY | 4 | test.b.a_id | 1 | NULL | +----+-------------+-------+--------+---------------+---------+---------+-------------+----------+-------------+
And yes: I know this query could be more optimal by setting an index on b.a_id.Results
The whole workload was executed completely in memory and thus CPU bound (we did not want to measure the speed of our I/O system).SEJOIN columnbytesquery timeGainSpaceCharacter setInnoDBMEDIUMINT35.28 s96%4% faster75%InnoDBINT45.48 s100%100%100%InnoDBBIGINT85.65 s107%7% slower200%InnoDBNUMERIC(7, 2)~46.77 s124%24% slower~100%InnoDBVARCHAR(7)7-86.44 s118%18% slower~200%latin1InnoDBVARCHAR(16)7-86.44 s118%18% slower~200%latin1InnoDBVARCHAR(32)7-86.42 s118%18% slower~200%latin1InnoDBVARCHAR(128)7-86.46 s118%18% slower~200%latin1InnoDBVARCHAR(256)8-96.17 s114%14% slower~225%latin1InnoDBVARCHAR(16)7-86.96 s127%27% slower~200%utf8InnoDBVARCHAR(128)7-86.82 s124%24% slower~200%utf8InnoDBCHAR(16)166.85 s125%25% slower400%latin1InnoDBCHAR(128)1289.68 s177%77% slower3200%latin1InnoDBTEXT8-910.7 s195%95% slower~225%latin1MyISAMINT43.16 s58%42% fasterTokuDBINT44.52 s82%18% faster
Some comments to the tests:
- MySQL 5.6.13 was used for most of the tests.
- TokuDB v7.1.0 was tested with MySQL 5.5.30.
- As results the optimistic cases were taken. In reality the results can be slightly worse.
- We did not take into consideration that bigger data types will eventually cause more I/O which is very slow!
- What is MyEnv?
- MySQL Workbench configuration.
MySQL Workbench is a very good and free GUI tool provided by Oracle to manage MySQL administration and development tasks. Opening many MySQL connections (same or different instances, remote or local MySQL servers) at the same time is one of its main features. While it's working fine to perform SQL statements on the different connections opened for multiple instances, but some people are asking if it is available as well to start and stop multiple MySQL instances using MySQL Workbench? if yes, how to configure it to perform such task? and also does that make any conflict with MyEnv tool - if it's installed - or not?
Yes, MySQL Workbench could be configured to start and stop multiple MySQL instances (local or remote) and it does not make any conflict with MyEnv tool.
In this article, I will describe how to configure MySQL Workbench to start and stop multiple MySQL instances.Prerequisites
System information and installed packages:
- Operating System: Ubuntu 12.04 (64 bit) .
- MySQL Server: Any mysql version (I used MySQL 5.5 tarballs).
- Number of MySQL Instances: Two instances are installed (mysql1 & mysql2).
- MySQL Workbench: Version 6.0 .
- MyEnv: Version 1.1 (Optional).
MyEnv is a set of scripts to run comfortably multiple MySQL, Percona Server or MariaDB database instances on the same server. You can even run multiple database instances with different binary versions.
I will not talk more about MyEnv features and its benefits rather, I'd like to mention that if you're using MyEnv and want to use MySQL Workbench at the same time, you will not face any conflict between them both and you can manage your MySQL instances by either MyEnv or MySQL Workbench.
For more information about MyEnv tool and the download link , you can check it out on our website myenv.MySQL Workbench configuration Prepare MySQL start and stop scripts
While MyEnv does not require you to create start or stop server scripts for tarballs binaries or installation from source - it manages this part by itself - but MySQL Workbench does.
The main point is that MySQL Workbench just execute start and stop commands as they would be execute in the system shell.
Assuming the installation paths for the two MySQL instances are ("/opt/mysql1" and "/opt/mysql2"), then server start and stop scripts could be created as follows:
- Use the mysql.server scripts after modifying the "basedir" and "datadir" variables:
moll@ubuntu:~$ sudo cp /opt/mysql1/support-files/mysql.server /etc/init.d/mysqld1 moll@ubuntu:~$ sudo vi /etc/init.d/mysqld1 . . basedir=/opt/mysql1 datadir=/opt/mysql1/data . . moll@ubuntu:~$ sudo cp /opt/mysql2/support-files/mysql.server /etc/init.d/mysqld2 moll@ubuntu:~$ sudo vi /etc/init.d/mysqld2 . . basedir=/opt/mysql2 datadir=/opt/mysql2/data . . moll@ubuntu:~$
- Test those scripts first to make sure they're working fine:
moll@ubuntu:~$ sudo /etc/init.d/mysqld2 stop Shutting down MySQL .. SUCCESS! moll@ubuntu:~$ sudo /etc/init.d/mysqld1 stop Shutting down MySQL .... SUCCESS! moll@ubuntu:~$ sudo /etc/init.d/mysqld1 start Starting MySQL ..... SUCCESS! moll@ubuntu:~$ sudo /etc/init.d/mysqld2 start Starting MySQL ..... SUCCESS! moll@ubuntu:~$ sudo /etc/init.d/mysqld1 status * MySQL running (22533) moll@ubuntu:~$ sudo /etc/init.d/mysqld2 status * MySQL running (23405) moll@ubuntu:~$
Now we are ready to add them to the MySQL Workbench.Add MySQL connections in MySQL Workbench
- Choose a connection name for the 1st instance "mysql1" and specify the connection string:
If you didn't adjust the "Configure Server Management" - at the left button of the previous screen - you can open MySQL connections and perform SQL queries normally to this instance but you can neither edit the instance configuration parameters nor start/stop it.
BTW, you can adjust it at anytime later and that what I did already for this example.
- Adjust the instance configurations:
Use this window to modify System type,Configuration file path, start and stop MySQL commands to match the instance configurations (here you can use the start and stop commands for the script which was previously created).
- Do the same steps for the 2nd instance "mysql2".
At this point, everything should be OK as start/stop commands were previously tested outside of the MySQL Workbench, but it wont work as expected!
Start/Stop Server button in MySQL Workbench depends on the output of the status command, and the button label will be changed to execute the appropriate command accordingly (for ex. if the instance is stopped, the label will be changed to "Start Server" and the start command will be executed if the button clicked and vise versa).
So it didn't work as expected because the default status command used which is "ps -C mysqld -o pid=" (check the "Configure Server Management" image) and that checks for a mysql daemon running (in this case the other instance was still running), so the "Stop Server" will work as expected but the button label wont be changed to "Start Server" unless all running instances are stopped:
We can make a workaround to fix this function by doing the following steps:
- Change the output message of the status command - which is being printed when the instance is not running - in /etc/init.d/mysqld1 and /etc/init.d/mysqld2 from "MySQL is not running" to "MySQL is stopped" so we can have different keywords in the output messages between running and not running status (the running status output message is "MySQL running").
- Change the check status command for each instance (in "Configure Server Management" window) from "ps -C mysqld -o pid=" to "sudo /etc/init.d/mysqld1 status|grep running" for the 1st instance and to "sudo /etc/init.d/mysqld2 status|grep running" for the 2nd instance.
Now everything will be working fine ...
Note: It doesn't matter how MySQL was installed (RPM, tarballs or from source), it's the same concept, you just need working start/stop commands and status output message contains different keywords when MySQL is started or stopped.