To UNION or not to UNION...

Recently a forum question
[ 1
] got my attention:

*Is there any performance issue with Union?

I used union all sometime back and it was performance issue just to make an opinion that we should used union in query.*

The question itself was not too interesting because the answer is easy: It depends. But I wanted to see if there was an improvement in this common problem over time in MySQL.

Test set-up

So I prepared a little test to simulate some of the possible scenarios:

CREATE TABLE `u` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `a` int(10) unsigned DEFAULT NULL,
  `b` int(10) unsigned DEFAULT NULL,
  `c` int(10) unsigned DEFAULT NULL,
  `d` int(10) unsigned DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `a` (`a`),
  KEY `b` (`b`),
  KEY `c` (`c`),
  KEY `d` (`d`)
) ENGINE=InnoDB
;

INSERT INTO u SELECT NULL, ROUND(RAND()*10, 0), ROUND(RAND()*10, 0), ROUND(RAND()*1000000, 0), ROUND(RAND()*1000000, 0);
INSERT INTO u SELECT NULL, ROUND(RAND()*10, 0), ROUND(RAND()*10, 0), ROUND(RAND()*1000000, 0), ROUND(RAND()*1000000, 0) FROM u;
... 1 mio rows

ANALYZE TABLE u;

With this table we can simulate the OR problem with low and high selectivity.

Running the tests

We did the tests with MySQL (5.0 - 5.7), Percona Server (5.6) and MariaDB (5.5, 10.0) for the following queries:

EXPLAIN SELECT * FROM u WHERE a = 5 OR b = 5;
EXPLAIN SELECT * FROM u WHERE a = 5 OR c = 500001;
EXPLAIN SELECT * FROM u WHERE c = 500001 OR d = 500001;

We are interested in what the optimizer is doing and what the performance of the queries is. The following results came out:

Query 1Query 2Query 3
Database versionrowsavg. timeQEProwsavg. timeQEProwsavg. timeQEP
MySQL 5.0.92194402390 ms1104876230 ms26< 10 ms3
MySQL 5.1.66194402410 ms1104876240 ms26< 10 ms3
MySQL 5.5.24194402420 ms1104876370 ms16< 10 ms3
MariaDB 5.5.32194402460 ms1104876420 ms16< 10 ms3
MySQL 5.6.12194402440 ms2104876240 ms26< 10 ms3
Percona 5.6.12-60.40194402450 ms2104876240 ms26< 10 ms3
MySQL 5.7.1194402420 ms2104876220 ms26< 10 ms3
MariaDB 10.0.3194402450 ms1104876400 ms16< 10 ms3

Different Query Execution Plans (QEP)

QEP 1:

+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows    | Extra       |
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
|  1 | SIMPLE      | u     | ALL  | a,b           | NULL | NULL    | NULL | 1049134 | Using where | 
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+

QEP 2:

+----+-------------+-------+-------------+---------------+------+---------+------+--------+-------------------------------+
| id | select_type | table | type        | possible_keys | key  | key_len | ref  | rows   | Extra                         |
+----+-------------+-------+-------------+---------------+------+---------+------+--------+-------------------------------+
|  1 | SIMPLE      | u     | index_merge | a,c           | a,c  | 5,5     | NULL | nnnnnn | Using union(a,c); Using where | 
+----+-------------+-------+-------------+---------------+------+---------+------+--------+-------------------------------+

QEP 3:

+----+-------------+-------+-------------+---------------+------+---------+------+------+-------------------------------+
| id | select_type | table | type        | possible_keys | key  | key_len | ref  | rows | Extra                         |
+----+-------------+-------+-------------+---------------+------+---------+------+------+-------------------------------+
|  1 | SIMPLE      | u     | index_merge | c,d           | c,d  | 5,5     | NULL |    n | Using union(c,d); Using where | 
+----+-------------+-------+-------------+---------------+------+---------+------+------+-------------------------------+

Conclusion

  • Single query performance went down from 5 - 50% (in one case increased by 5%) over time (MySQL releases). But we can see some impacts on optimizer improvements.
  • Newer MySQL releases are not necessarily faster for single-query performance than older ones. Most of the MySQL users are not running more than 1 or 2 concurrent queries. For them scalability improvements are not really an issue.
  • There seems to be some changes in the Optimizer some for good, some for bad, depending on the release or branch/fork you are using. So test carefully when you change the release or branch/fork.
  • And: Do not believe the whole marketing yelling but do your own testing…
tags: