To UNION or not to UNION...
Recently a forum question
[ 1
] got my attention:
*Is there any performance issue with Union?
I used union all sometime back and it was performance issue just to make an opinion that we should used union in query.*
The question itself was not too interesting because the answer is easy: It depends. But I wanted to see if there was an improvement in this common problem over time in MySQL.
Test set-up
So I prepared a little test to simulate some of the possible scenarios:
CREATE TABLE `u` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`a` int(10) unsigned DEFAULT NULL,
`b` int(10) unsigned DEFAULT NULL,
`c` int(10) unsigned DEFAULT NULL,
`d` int(10) unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `a` (`a`),
KEY `b` (`b`),
KEY `c` (`c`),
KEY `d` (`d`)
) ENGINE=InnoDB
;
INSERT INTO u SELECT NULL, ROUND(RAND()*10, 0), ROUND(RAND()*10, 0), ROUND(RAND()*1000000, 0), ROUND(RAND()*1000000, 0);
INSERT INTO u SELECT NULL, ROUND(RAND()*10, 0), ROUND(RAND()*10, 0), ROUND(RAND()*1000000, 0), ROUND(RAND()*1000000, 0) FROM u;
... 1 mio rows
ANALYZE TABLE u;
With this table we can simulate the OR problem with low and high selectivity.
Running the tests
We did the tests with MySQL (5.0 - 5.7), Percona Server (5.6) and MariaDB (5.5, 10.0) for the following queries:
EXPLAIN SELECT * FROM u WHERE a = 5 OR b = 5;
EXPLAIN SELECT * FROM u WHERE a = 5 OR c = 500001;
EXPLAIN SELECT * FROM u WHERE c = 500001 OR d = 500001;
We are interested in what the optimizer is doing and what the performance of the queries is. The following results came out:
| Query 1 | Query 2 | Query 3 | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Database version | rows | avg. time | QEP | rows | avg. time | QEP | rows | avg. time | QEP |
| MySQL 5.0.92 | 194402 | 390 ms | 1 | 104876 | 230 ms | 2 | 6 | < 10 ms | 3 |
| MySQL 5.1.66 | 194402 | 410 ms | 1 | 104876 | 240 ms | 2 | 6 | < 10 ms | 3 |
| MySQL 5.5.24 | 194402 | 420 ms | 1 | 104876 | 370 ms | 1 | 6 | < 10 ms | 3 |
| MariaDB 5.5.32 | 194402 | 460 ms | 1 | 104876 | 420 ms | 1 | 6 | < 10 ms | 3 |
| MySQL 5.6.12 | 194402 | 440 ms | 2 | 104876 | 240 ms | 2 | 6 | < 10 ms | 3 |
| Percona 5.6.12-60.40 | 194402 | 450 ms | 2 | 104876 | 240 ms | 2 | 6 | < 10 ms | 3 |
| MySQL 5.7.1 | 194402 | 420 ms | 2 | 104876 | 220 ms | 2 | 6 | < 10 ms | 3 |
| MariaDB 10.0.3 | 194402 | 450 ms | 1 | 104876 | 400 ms | 1 | 6 | < 10 ms | 3 |
Different Query Execution Plans (QEP)
QEP 1:
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
| 1 | SIMPLE | u | ALL | a,b | NULL | NULL | NULL | 1049134 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
QEP 2:
+----+-------------+-------+-------------+---------------+------+---------+------+--------+-------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------------+---------------+------+---------+------+--------+-------------------------------+
| 1 | SIMPLE | u | index_merge | a,c | a,c | 5,5 | NULL | nnnnnn | Using union(a,c); Using where |
+----+-------------+-------+-------------+---------------+------+---------+------+--------+-------------------------------+
QEP 3:
+----+-------------+-------+-------------+---------------+------+---------+------+------+-------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------------+---------------+------+---------+------+------+-------------------------------+
| 1 | SIMPLE | u | index_merge | c,d | c,d | 5,5 | NULL | n | Using union(c,d); Using where |
+----+-------------+-------+-------------+---------------+------+---------+------+------+-------------------------------+
Conclusion
- Single query performance went down from 5 - 50% (in one case increased by 5%) over time (MySQL releases). But we can see some impacts on optimizer improvements.
- Newer MySQL releases are not necessarily faster for single-query performance than older ones. Most of the MySQL users are not running more than 1 or 2 concurrent queries. For them scalability improvements are not really an issue.
- There seems to be some changes in the Optimizer some for good, some for bad, depending on the release or branch/fork you are using. So test carefully when you change the release or branch/fork.
- And: Do not believe the whole marketing yelling but do your own testing…

