You are here

MySQL Federated Storage Engine

What is a Federated Table?

A Federated Table is a table which points to a table in an other MySQL database instance (mostly on an other server). It can be seen as a view to this remote database table. Other RDBMS have similar concepts for example database links.

What can I do with a Federated Table?

To show what you can do with a federated table let us assume the following constellation: Two MySQL databases on two different servers. The first one called provider (it provides the data) the second one called requester (it requests the data). For a better understanding see the following example:

CREATE TABLE provider (
    a  INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY
  , b  VARCHAR(32) NULL
  , INDEX b_i (b)
) ENGINE = MyISAM;

INSERT INTO provider
VALUES (NULL, 'Apfel'), (NULL, 'Birne'), (NULL, 'Pflaume')
     , (NULL, 'Banane'), (NULL, 'Kirsche'), (NULL, 'Quitte');

SELECT * FROM provider;

CREATE TABLE requester (
    a  INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY
  , b  VARCHAR(32) NULL
  , INDEX b_i (b)
) ENGINE = FEDERATED
CONNECTION='mysql://root@provider:3306/test/provider';

SELECT * FROM requester;

And now let us do something with it:

SELECT b AS obst
  FROM requester
 WHERE a IN (1, 2, 3)
 ORDER BY b;

+---------+
| obst    |
+---------+
| Apfel   |
| Birne   |
| Pflaume |
+---------+

Even a join with an other local table is possible:

CREATE TABLE sales (
    id      INT      UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY
  , item_id INT      UNSIGNED NOT NULL
  , cnt     SMALLINT UNSIGNED NOT NULL
  , prc     DECIMAL(5,2)
  , INDEX item_id_i (item_id)
) ENGINE = InnoDB;

INSERT INTO sales
VALUES (NULL, 1, 3, 1.25), (NULL, 2, 2, 4.37), (NULL, 2, 1, 0.65)
     , (NULL, 3, 5, 4.05), (NULL, 1, 9, 9.85), (NULL, 2, 2, 0.85);

SELECT b AS obst, SUM(cnt) AS sum, SUM(prc) as price
  FROM sales s
  JOIN requester r ON s.item_id = r.a
 GROUP BY b
 ORDER BY SUM(prc) ASC;

+---------+------+-------+
| obst    | sum  | price |
+---------+------+-------+
| Pflaume |    5 |  4.05 |
| Birne   |    5 |  5.87 |
| Apfel   |   12 | 11.10 |
+---------+------+-------+

Federated Table pitfalls

But it would be to nice if there were not some pitfalls... When we enable the general query log (log = 1) we can see what happens on the provider:

First we do just a little query:

SELECT b
  FROM requester
 WHERE a IN (1, 2, 3)
 ORDER BY b;

What we can see is that funny output. It does not matter if we run it with ORDER BY or not. Less nice is, that all columns are retrieved and sent to the requester as we also can see later:

SHOW TABLE STATUS LIKE 'provider'
SELECT `a`, `b` FROM `provider` WHERE  ( (`a` = '1') ) AND ( (`a` <= '1') )
SELECT `a`, `b` FROM `provider` WHERE  ( (`a` = '2') ) AND ( (`a` <= '2') )
SELECT `a`, `b` FROM `provider` WHERE  ( (`a` = '3') ) AND ( (`a` <= '3') )

Or a little simpler query:

SELECT b
  FROM requester
 WHERE a = 2;

In earlier MySQL 5.0 release even the filter clause (WHERE) was not sent to the provider!

SHOW TABLE STATUS LIKE 'provider'
SELECT `a`, `b` FROM `provider` WHERE  (`a` = '2')

Even if the column a is never touched it is retrieved and sent:

SELECT b
  FROM requester
 WHERE b = 'Pflaume';

SELECT `a`, `b` FROM `provider` WHERE  (`b` = 'Pflaume');

We see now, that Federated Tables are actually (5.0.26) not usable with huge amount of data like for example in data warehouses

Workaround for not used attributes

There is a trick to work around the problem that all attributes are sent to the requester. But be careful: This is not official documented and thus not official supported!

CREATE VIEW provider_v AS
SELECT b
  FROM provider;

CREATE TABLE requester_short (
    b  VARCHAR(32) NULL
  , INDEX b_i (b)
) ENGINE = FEDERATED
CONNECTION='mysql://root@provider:3306/test/provider_v';

SELECT *
  FROM requester_short
 WHERE b = 'Pflaume';

SELECT `b` FROM `provider_v` WHERE  (`b` = 'Pflaume');

Complex aggregation queries

What happens in this case is not yet clear to me:

SELECT b AS obst, SUM(cnt) AS sum, SUM(prc) as price
  FROM sales s
  JOIN requester r ON s.item_id = r.a
 GROUP BY b
 ORDER BY SUM(prc) ASC;

SHOW TABLE STATUS LIKE 'provider'
SELECT `a`, `b` FROM `provider` WHERE  (`a` = '1')
SELECT `a`, `b` FROM `provider` WHERE  (`a` = '2')
SELECT `a`, `b` FROM `provider` WHERE  (`a` = '3')
SELECT `a`, `b` FROM `provider` WHERE  (`a` = '1')
SELECT `a`, `b` FROM `provider` WHERE  (`a` = '2')

But at least the result seems to be correct!

Performance issues

Now let us see what about the performance on much data:

SELECT COUNT(*) FROM provider;
+----------+
| count(*) |
+----------+
|  1000000 |
+----------+
1 row in set (0.01 sec)

What happens in the background can be seen here. Now it is also clear why it is that slow:

SELECT COUNT(*) FROM requester;
+----------+
| count(*) |
+----------+
|  1000000 |
+----------+
1 row in set (2.41 sec)

SHOW TABLE STATUS LIKE 'provider';
SELECT `a`, `b` FROM `provider`;

If we force a full table scan the results are as follows:

SELECT COUNT(*) FROM (SELECT * FROM provider) x;
+----------+
| count(*) |
+----------+
|  1000000 |
+----------+
1 row in set (2.00 sec)

A UNIX socket seems to be a fast network :-)

SELECT COUNT(*) FROM requester;
+----------+
| count(*) |
+----------+
|  1000000 |
+----------+
1 row in set (2.45 sec)

SELECT COUNT(*) FROM (SELECT * FROM requester) x;
+----------+
| count(*) |
+----------+
|  1000000 |
+----------+
1 row in set (3.74 sec)

In the following example we can see the performance impact of the not optimized attribute passing:

SELECT COUNT(*) FROM (SELECT b FROM provider  WHERE b = 'Pflaume') x;
+----------+
| count(*) |
+----------+
|   131072 |
+----------+
1 row in set (0.26 sec)

In the second example we can see the effect of our VIEW trick:

SELECT COUNT(*) FROM (SELECT b FROM requester WHERE b = 'Pflaume') x;
+----------+
| count(*) |
+----------+
|   131072 |
+----------+
1 row in set (0.92 sec)

SELECT COUNT(*) FROM (SELECT b FROM requester_short WHERE b = 'Pflaume') x;
+----------+
| count(*) |
+----------+
|   131072 |
+----------+
1 row in set (0.49 sec)

An other bad example is the LIMIT clause:

SELECT * FROM provider LIMIT 3;
+----+---------+
| a  | b       |
+----+---------+
|  1 | Apfel   |
|  2 | Birne   |
|  3 | Pflaume |
+----+---------+
3 rows in set (0.00 sec)

It always does a full table scan and send everything over the network:

SELECT * FROM provider_user LIMIT 3;
+----+---------+
| a  | b       |
+----+---------+
|  1 | Apfel   |
|  2 | Birne   |
|  3 | Pflaume |
+----+---------+
3 rows in set (1.86 sec)

SELECT `a`, `b` FROM `provider`

Security issues with Federated Tables

If the table requester is created in the same schema as normal tables we have a documented security issue: Any user who is allowed to call the SHOW CREATE TABLE command can see the password from the remote database. This might be not wanted in any case:

SHOW CREATE TABLE requester\G
*************************** 1. row ***************************
       Table: requester
Create Table: CREATE TABLE `requester` (
  `a` int(10) unsigned NOT NULL auto_increment,
  `b` varchar(32) default NULL,
  PRIMARY KEY  (`a`),
  KEY `b_i` (`b`)
) ENGINE=FEDERATED DEFAULT CHARSET=latin1
CONNECTION='mysql://root:secret@master:3320/test/provider'
1 row in set (0.00 sec)

To avoid this you can simply move this table into a schema where the users do not have access to and create some VIEWS into the users schema:

CREATE SCHEMA users;

USE users;

CREATE VIEW requester_v AS
SELECT * FROM xyz.requester;

CREATE USER dummy@'localhost' IDENTIFIED BY 'dummy';
GRANT SELECT ON users.requester_v TO dummy@localhost;


-- switch to user dummy

USE xyz;
ERROR 1044 (42000): Access denied for user 'dummy'@'localhost' to database 'xyz'

SHOW CREATE TABLE xyz.requester;
ERROR 1142 (42000): SELECT command denied to user 'dummy'@'localhost'
for table 'requester'

SELECT * FROM requester_v LIMIT 3;
+----+---------+
| a  | b       |
+----+---------+
|  1 | Apfel   |
|  2 | Birne   |
|  3 | Pflaume |
+----+---------+
3 rows in set (1.92 sec)