Skip navigation.
Home

Languages

RSS Feed FromDual on Linked In Recommend us on Xing Join us on Facebook Google+ Flickr Contact us

FromDual TechFeed (en)

Recover lost .frm files for InnoDB tables

Shinguz - Tue, 2011-12-20 22:51

Recently I found in a forum the following request for help:

My MySQL instance crashed because of free disk space fault. I saw in /var/lib/mysql all the files: ibdata1, ib_logfile* and all the folders containing frm files. Well, when i solved the problem and run successfully the instance, some databases disappeared. One of those is the most important, and i don't know how many tables had and their structures. Is there any way for recover the entire lost database (structure and data) only having the ibdata1 file?

First of all the observation sounds a bit strange because files do not just disappear. So I fear that its not just the .frm files which are lost. But let's think positive and assume just the .frm files have gone...

To recover the tables is a bit tricky because the .frm files contains the information about the table structure for MySQL.

If you have any old backup or only a structure dump it would be very helpful..

In InnoDB there is the table structure stored as well. You can get it out with the InnoDB Table Monitor as follows:

mysql> CREATE SCHEMA recovery; mysql> use recovery; mysql> CREATE TABLE innodb_table_monitor (id INT) ENGINE = InnoDB;

MySQL will write the output into its error log:

TABLE: name test/test, id 16, flags 1, columns 4, indexes 1, appr.rows 3 COLUMNS: id: DATA_INT DATA_BINARY_TYPE len 4; DB_ROW_ID: DATA_SYS prtype 256 len 6; DB_TRX_ID: DATA_SYS prtype 257 len 6; DB_ROLL_PTR: DATA_SYS prtype 258 len 7; INDEX: name GEN_CLUST_INDEX, id 18, fields 0/4, uniq 1, type 1 root page 312, appr.key vals 3, leaf pages 1, size pages 1 FIELDS: DB_ROW_ID DB_TRX_ID DB_ROLL_PTR id

With these information and some experience you can guestimate the original table structure:

Schema and table name: test.test

id: DATA_INT DATA_BINARY_TYPE len 4; DB_ROW_ID: DATA_SYS prtype 256 len 6; DB_TRX_ID: DATA_SYS prtype 257 len 6; DB_ROLL_PTR: DATA_SYS prtype 258 len 7;

The table has only 1 column called id which is an 4 byte int, the other columns are InnoDB internal stuff (19 byte!).

INDEX: name GEN_CLUST_INDEX, id 18, fields 0/4, uniq 1, type 1

The table has only one generated clustered index (no explicit index!).

So we can guess:

mysql> CREATE TABLE test.test ( id INT UNSIGNED NOT NULL DEFAULT 0 ) ENGINE = InnoDB CHARSET=utf8;

This table has to be created on a second system now. From there we see with the InnoDB table monitor:

TABLE: name test/test, id 0 1269, columns 4, indexes 1, appr.rows 0 COLUMNS: id: DATA_INT DATA_UNSIGNED DATA_BINARY_TYPE DATA_NOT_NULL len 4; DB_ROW_ID: DATA_SYS prtype 256 len 6; DB_TRX_ID: DATA_SYS prtype 257 len 6; DB_ROLL_PTR: DATA_SYS prtype 258 len 7; INDEX: name GEN_CLUST_INDEX, id 0 909, fields 0/4, uniq 1, type 1 root page 3, appr.key vals 0, leaf pages 1, size pages 1 FIELDS: DB_ROW_ID DB_TRX_ID DB_ROLL_PTR id

This is not 100% correct yet.

id seems to be SIGNED and not UNSIGNED and NULL seems to be allowed. So next try:

mysql> CREATE TABLE test.test ( id INT SIGNED NULL ) ENGINE = InnoDB CHARSET=utf8;TABLE: name test/test, id 0 1271, columns 4, indexes 1, appr.rows 0 COLUMNS: id: DATA_INT DATA_BINARY_TYPE len 4; DB_ROW_ID: DATA_SYS prtype 256 len 6; DB_TRX_ID: DATA_SYS prtype 257 len 6; DB_ROLL_PTR: DATA_SYS prtype 258 len 7; INDEX: name GEN_CLUST_INDEX, id 0 911, fields 0/4, uniq 1, type 1 root page 3, appr.key vals 0, leaf pages 1, size pages 1 FIELDS: DB_ROW_ID DB_TRX_ID DB_ROLL_PTR id

So this looks pretty much like it should. Do not be confused because of some other details. The original table was created on a MySQL 5.6.4 and the .frm recovery is done on a 5.1.55.

Now copy the .frm file to the original database and look if you can access your data. If it does you can do this table by table for all you zillions of tables...
When you are done. Take a backup and ideally do a proper install of your database!

Just a little detail: I created the original table like this:

mysql> CREATE TABLE test.test (id INT) ENGINE = InnoDB; mysql> SHOW CREATE TABLE test.test\G CREATE TABLE `test` ( `id` int(11) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1

MySQL seems to figure out itself what is the correct character set...

Rolling upgrade of Galera 1.0 to 1.1

Shinguz - Wed, 2011-12-14 15:38

A few days ago Codership announced their ">new version Galera v1.1 - synchronous Replication Cluster for MySQL. Before we look at the new feature of Rolling Online Schema Upgrade (OSU) we have a look at how to upgrade to the new Galera release.

A rolling upgrade of your synchronous Galera Replication Cluster from version 1.0 to 1.1 is quite easy when you stay at the same MySQL version (5.5).

To not lose the availability of your database service during the upgrade you should have at least 3 Galera nodes in your Cluster.

For further details please also look at MySQL/Galera cluster upgrade.

Hint: If you can do without rolling upgrade, you better avoid it and take your Galera Cluster down.

Check the version

Check first the version you are currently running on:

SHOW GLOBAL VARIABLES LIKE 'version'; +---------------+-----------------------+ | Variable_name | Value | +---------------+-----------------------+ | version | 5.5.15-wsrep_21.1-log | +---------------+-----------------------+

We can see that we are using MySQL 5.5.15 with the wsrep API Version 21 and the wsrep patch 21.1.

SHOW GLOBAL STATUS LIKE 'wsrep_provider_version'; +------------------------+-------------+ | Variable_name | Value | +------------------------+-------------+ | wsrep_provider_version | 21.1.0(r86) | +------------------------+-------------+

Here we see that we are using the Galera Replicator (plug-in) 1.0 (revision 86) based on the wsrep API Version 21.

Some rules to Galera versioning. We have 4 different version numbers we should care about:

  • MySQL version (5.5.15)
  • Wsrep API version (21, 22, ...)
    The wsrep API versions will always be single monotonically increasing numbers: 21, 22, ... That indicates API compatibility between MySQL and Galera.
  • Wsrep Patch version (21.1)
    The wsrep patch versions has the form 21.1 where the 21 represents the API version and the 1 represents bug-fix of that API version.
  • Galera Replicator (= provider, plug-in) (1.0(r86))
    Galera versions will be in the form <major>.<minor> with minor meaning: bug-fixes and small features and major: major features, which involve a lot of code change.

Galera 22.1.1 is backward compatible with Galera 21.1.0. I was told that Galera should be at least ONE version backward compatible. So 1.0 should be for 0.8 and 1.1 for 1.0 and 1.2 for 1.1 and 2.0 for 1.2 etc.

Preparation

Download the packages for your preferred installation method from here:

In my case there was only a binary tar ball provided for Codership-MySQL but not for the Galera Plug-in v1.1. So I extracted it from the Debian package as follows:

dpkg-deb -x galera-22.1.1-amd64.deb /tmp/oli/ cp /tmp/oli/usr/bin/garbd /home/mysql/product/mysql-5.5.17-wsrep-22.3/bin cp /tmp/oli/usr/lib/galera/libgalera_smm.so /home/mysql/product/mysql-5.5.17-wsrep-22.3/lib/plugin/

For RPM's it should work in a similar way:

rpm2cpio package.rpm | cpio -idmvPrecautions

Make sure, that during upgrade from 5.1 to 5.5 no DDL's are allowed!

Upgrade

Then upgrade your Galera Cluster as follows:

  • Shift load away from this node.
  • Shutdown node (/etc/init.d/mysql stop)
  • Uninstall or remove the old Galera plug-in.
  • Uninstall or remove the old Codership-MySQL Binaries
  • Install the new Codership-MySQL binaries with the wsrep API version 22
  • Install the new Galera plug-in v1.1
  • Check if wsrep_provider in my.cnf is pointing to the correct new location.
  • Start node (/etc/init.d/mysql start)
  • Check if node came up properly:
    SHOW GLOBAL STATUS LIKE 'wsrep%'; +----------------------------+--------------------------------------+ | Variable_name | Value | +----------------------------+--------------------------------------+ | wsrep_local_state | 4 | | wsrep_local_state_comment | Synced (6) | | wsrep_cluster_size | 3 | | wsrep_cluster_status | Primary | | wsrep_connected | ON | | wsrep_local_index | 1 | | wsrep_provider_version | 22.1.1(r95) | | wsrep_ready | ON | +----------------------------+--------------------------------------+
  • If this is the case shift load back to this node.
    If you have already troubles up to here we recommend to solve the problems first and NOT to continue with the upgrade procedure. Otherwise you risk the loss of you complete service.
  • If your reached this step you can upgrade the next node in your Galera Cluster.

When you have upgraded all your nodes in the Galera Cluster you should notice, that the Protocol version will switch automatically from 1 to 2:

111212 17:32:24 [Note] WSREP: Quorum results: version = 1, component = PRIMARY, conf_id = 6, ... 111212 17:34:33 [Note] WSREP: Quorum results: version = 2, component = PRIMARY, conf_id = 7,Configuration for Rolling restart

In Galera Cluster configurations you see often that the Cluster is still set to its initial start configuration which is inappropriate for a rolling restart operation:

Galera configuration for an initial Cluster start

Galera node 1: wsrep_cluster_address: gcomm://
Galera node 2: wsrep_cluster_address: gcomm://192.168.1.101:4567
Galera node 3: wsrep_cluster_address: gcomm://192.168.1.102:4567

In this case Node 2 and 3 are OK for a rolling restart but Galera Node 1 will fail to restart.

Galera configuration for normal operations and a Cluster rolling restart

This is the way we recommend to have a Galera configuration for normal operations and a rolling restart:

Galera node 1: wsrep_cluster_address: gcomm://192.168.1.103:4567
Galera node 2: wsrep_cluster_address: gcomm://192.168.1.101:4567
Galera node 3: wsrep_cluster_address: gcomm://192.168.1.102:4567

Every Galera node points to its "left" neigbour.

Upgrade from MySQL 5.1/Galera 0.8 to MySQL 5.5/Galera 1.1

Upgrading from MySQL 5.1/Galera 0.8 to MySQL 5.5/Galera 1.1 has to be done in 2 steps because Codership only provides backwards-compatibility for one minor version jumps (0.8 -> 1.0 -> 1.1 -> 1.2 -> 2.0).

We have 2 possibilities now:

5.1/0.8 -> 5.1/1.0 -> 5.5/1.1

or

5.1/0.8 -> 5.5/1.0 -> 5.5/1.1

Which one you choose is up to you.

A rolling upgrade on a running system is impossible without a snapshot state transfer (SST) at the moment. So be prepared it takes a while and causes some load on the systems.

In our case we chose the way via 5.5/1.0 (2nd way).

To upgrade from 5.1/0.8 to 5.5/1.0 proceed as follows:

  • Shift load away from this node to the other 2 nodes.
  • Shutdown this node (/etc/init.d/mysql stop)
  • Set wsrep_provider = none in my.cnf
  • Uninstall or remove the old Galera plug-in.
  • Uninstall or remove the old Codership-MySQL Binaries
  • Install the new Codership-MySQL binaries
  • Install the new Galera plug-in
  • Start this node (/etc/init.d/mysql start)
  • Then you will get some error messages:

    111214 11:44:59 [ERROR] Missing system table mysql.proxies_priv; please run mysql_upgrade to create it 111214 11:44:59 [ERROR] Native table 'performance_schema'.'events_waits_current' has the wrong structure ... 111214 11:44:59 [ERROR] Native table 'performance_schema'.'file_instances' has the wrong structure 111214 11:44:59 [Note] Event Scheduler: Loaded 0 events 111214 11:44:59 [Note] WSREP: wsrep_load(): loading provider library 'none'
  • Run mysql_upgrade (see MySQL upgrade instructions and consider that a MySQL binary upgrade is not officially supported/recommended (this is not a problem, because SST with mysqldump will do a logical restore anyway)).
  • Set wsrep_provider in my.cnf to the new plug-in location.
  • Prepare SST upgrade script on (all) the donor(s) node(s).

    cp wsrep_sst_mysqldump wsrep_sst_mysqldump_upgrade
  • Change the script wsrep_sst_mysqldump_upgrade that it dumps all databases except the mysql database:

    diff wsrep_sst_mysqldump wsrep_sst_mysqldump_upgrade 59c59 < --skip-comments --flush-privileges --all-databases" --- > --skip-comments --flush-privileges --databases test foodmart"
  • Caution: Be careful with Stored Procedures, Stored Functions, Triggers and Events! This upgrade procedure will NOT work completely if you use some of those MySQL features. This upgrade procedure will further not work completely if you have differences in the mysql schema of your Galera nodes (for what ever reason).
  • Set wsrep_sst_method = mysqldump_upgrade in my.cnf
  • Start this node (/etc/init.d/mysql start). Keep in mind that one of the remaining Galera nodes will act as a SST donor and during the synchronization he is not available for queries!
  • Check if node came properly up: SHOW GLOBAL STATUS LIKE 'wsrep%';
  • If this is the case shift load back to this node.
    If you have already troubles up to here we recommend to solve the problems first and NOT to continue with the upgrade procedure. Otherwise you risk the loss of you complete service.
  • Set wsrep_sst_method back to its original value (mysqldump).
  • If your reached this step you can upgrade the next node in your Galera Cluster.

If you finally manged to upgrade from MySQL 5.1/Galera 0.8 to MySQL 5.5/Galera 1.0 you can follow the procedure mentioned above to upgrade to Galera 1.1

Findings

To identify our different Galera Clusters we name them:

wsrep_cluster_name = "Galera-0.8 wsrep-21"

In this upgrade scenario this naming convention is very non-optimal because the name of our Galera Cluster should change as well. But the value of wsrep_cluster_name should be the same on all Galera nodes otherwise a node is not capable to join the Cluster (this is to make sure that a Galera node is not connecting by accident to a/the wrong Galera Cluster).
To change the wsrep_cluster_name parameter you have to bring down the whole Galera Cluster. This is not possible at the moment during a rolling restart.
Hopefully this constraint is released in a later Galera version.

Migrating from MySQL Master-Master Replication to Galera Multi-Master Replication

Shinguz - Fri, 2011-12-02 14:48
Introduction

Galera is a synchronous Multi-Master Replication for MySQL. It is therefore in competition with several other MySQL architectures:

Very often they can be easily replaced by Galera's synchronous Multi-Master Replication for MySQL.

All those products have some advantages and disadvantages. Very often MySQL Master-Master Replication is used in the field because of its simplicity to set-up. But after a while one faces its disadvantages which is mainly data inconsistency between the 2 Masters. This is not only the fault of MySQL Replication but MySQL Replication makes it easy to get such data inconsistencies.

In the following article we look at how you can replace a MySQL Master-Master Replication by Galera Multi-Master Replication with the possibility to fall back if you do not like the solution or if you run into troubles.

Starting point

Some MySQL users have a typical Master-Master Replication set-up like a) active-passive or b) active-active for High Availability (HA) reasons. Either in the same data center or even in remote data centers:

a) active-passive b) active-active Adding Galera synchronous Replication

As a first step you can add a Galera Replication Cluster as a simple Slave:

In this set-up you have to consider, that ALL nodes which are participating in replication (Master 1, Master 2 and Galera 1) have set the following parameters:

# # my.cnf # [mysqld] default_storage_engine = InnoDB log_slave_updates = 1 log_bin = bin-log server_id = <n> binlog_format = ROW

It is very important that the server_id is unique on all MySQL nodes BUT they have to be EQUAL for all Galera nodes. Example:

  • Master 1: server_id = 1
  • Master 2: server_id = 2
  • Galera 1: server_id = 3
  • Galera 2: server_id = 3
  • Galera 3: server_id = 3

This is to avoid conflicts during replication.

Galera is set-up as described in Installing MySQL/Galera. Please make sure, that you do no have any MyISAM tables anymore. Galera cannot cope with any other Storage Engine than InnoDB at the moment.

The following query helps you to find out if you are using any other Storage Engine than InnoDB:

SELECT table_schema, engine, COUNT(*) FROM information_schema.tables WHERE table_schema NOT IN ('information_schema', 'mysql', 'performance_schema') GROUP BY table_schema, engine;

Then you do a normal dump from the Master 2 as follows:

mysqldump --user=root --password --master-data --single-transaction \ --databases foodmart test > full_dump.sql

When you dump the database avoid dumping the mysql schema otherwise you will destroy your Galera Node 2 and 3. Then restore the dump on ONE node of the Galera Cluster (preferably on node 1) after setting it to its master:

CHANGE MASTER TO master_host='master1', master_port=3306 , master_user='replication', master_password='secret'; mysql --user=root --password < full_dump.sql

Then you can attach the Galera node to the Master 2.

Now all data arriving to your MySQL Master(s) will be automatically replicated to the Galera Cluster as well.

Adding the Galera Cluster into the ring

In a second step you can add the Galera Cluster into the Replication ring by pointing Master 1 to the Galera Node 1:

Application Load Balancing for Galera

To have true High Availability (HA) it makes sense to put some Load Balancer in front of your Galera Cluster. This can be done either through:

Now your Galera Replication Cluster is ready to put some load on it:

If you are more familiar with Galera you can move the Virtual IP (VIP) from MySQL Master-Master Replication to the Galera Replication Cluster:

And if you are happy with the synchronous replication and its scaling performance you can finally drop your old MySQL Master-Master set-up and bypass the VIP during next downtime of your application.

Shortcuts

A shortcut in this way would be when you directly replace Master 2 by a Galera node:

Then you need one server less and you can directly use the MySQL Master node as a base for starting with Galera. You just have to replace the MySQL Binaries by the MySQL-Galera Binaries and then add 2 other Galera nodes in the set-up.

Important notes

Currently Galera works only with InnoDB tables. So you have to make sure that you convert all your non-InnoDB tables to InnoDB tables (except the ones in the mysql Schema). Otherwise you will run into problems.

The described set-up works starting with Galera v1.1 and wsrep v22.3.

The memory of the Galera node getting the import has grown by 1.5 Gbyte in one of our tests. So be prepared that the system has enough memory! In our first tests the system heavily started to swap, which caused high I/O load. In this situation Galera behaved erroneous...

If the Master is under very high load Galera Slave can not catch up with the load and starts lagging... This is not a problem if you run the load only on the Galera Cluster!

To avoid a Split-Brain situations all Cluster Solutions need at least 3 nodes. This is the same with Galera. When you move from MySQL Master-Master replication you need one Server more than before to avoid this problem. Theoretically Galera can be run in a 2-node set-up but this is strongly NOT recommended to do.

One way out of this situation is to use the garbd who acts as an arbitrator in such a scenario. This is called a 2 1/2 node set-up.

And now have fun with your synchronous Multi-Master Galera Replication for MySQL...

How MySQL behaves with many schemata, tables and partitions

Shinguz - Wed, 2011-11-30 15:35
Introduction

Recently a customer claimed that his queries were slow some times and sometimes they were fast.

First idea: Flipping query execution plan caused by InnoDB could be skipped because it affected mainly MyISAM tables.
Second idea: Caching effects by either the file system cache caching MyISAM data or the MyISAM key buffer caching MyISAM indexes were examined: File system cache was huge and MyISAM key buffer was only used up to 25%.

I was a bit puzzled...

Then we checked the table_open_cache and the table_definition_cache which were still set to their default values. Their according status information Opened_tables and Opened_table_definitions were clearly indicating that those caches are much too small...

The typical reaction is: increase open_files_limit (after increasing the operating system user limits) to a higher value. Unfortunately most of the Linux distributions have a default values for open files of 1024 which is far to low for typical database systems.

Too many open files

But customer claimed that he tried this already (set open_files_limit to 50'000) and got error messages in the MySQL error log:

# perror 24 OS error code 24: Too many open files

So we were even more puzzled.

After some further investigation we found that the customer has up to 600 schemata and in each schema he had 30 to 100 tables and some of those tables have even monthly partitions up to 4 years (roughly 50 partitions).
A partition internally is handled similar to a table. So we have something between 18'000 and 3'000'000 tables/partitions in total. We wrote already about the problems with MySQL databases with many tables in Configuration of MySQL for Shared Hosting but now it is even worse with the partitions. I remembered that having too many partitions with MySQL is not a good idea so we investigated a bit deeper in this area:

We first looked at the amount of file handles MySQL has open:

lsof -p <pid> | wc -l

This value clearly moved towards our new open_files_limit of 150'000 (with table_open_cache and table_definition_cache set to 32k each) and then we got Too many open files errors. So we set table_open_cache and table_definition_cache both back to 2048 and go a stable number of file descriptors between 100'000 and 110'000.

This gives us an idea how to extrapolate those values when we want to have bigger caches. But we do not know how a Linux system or MySQL will behave with much higher values...

Partition table test

But why do we get such high values?

As an example we took the following table with 14 partitions:

CREATE TABLE ptn_test ( id INT UNSIGNED NOT NULL AUTO_INCREMENT , data VARCHAR(64) , ts TIMESTAMP , PRIMARY KEY (id, ts) , INDEX (ts) ) ENGINE = MyISAM PARTITION BY RANGE ( UNIX_TIMESTAMP(ts) ) ( PARTITION p_2010 VALUES LESS THAN ( UNIX_TIMESTAMP('2011-01-01 00:00:00') ) , PARTITION p_2011_01 VALUES LESS THAN ( UNIX_TIMESTAMP('2011-02-01 00:00:00') ) , PARTITION p_2011_02 VALUES LESS THAN ( UNIX_TIMESTAMP('2011-03-01 00:00:00') ) , PARTITION p_2011_03 VALUES LESS THAN ( UNIX_TIMESTAMP('2011-04-01 00:00:00') ) , PARTITION p_2011_04 VALUES LESS THAN ( UNIX_TIMESTAMP('2011-05-01 00:00:00') ) , PARTITION p_2011_05 VALUES LESS THAN ( UNIX_TIMESTAMP('2011-06-01 00:00:00') ) , PARTITION p_2011_06 VALUES LESS THAN ( UNIX_TIMESTAMP('2011-07-01 00:00:00') ) , PARTITION p_2011_07 VALUES LESS THAN ( UNIX_TIMESTAMP('2011-08-01 00:00:00') ) , PARTITION p_2011_08 VALUES LESS THAN ( UNIX_TIMESTAMP('2011-09-01 00:00:00') ) , PARTITION p_2011_09 VALUES LESS THAN ( UNIX_TIMESTAMP('2011-10-01 00:00:00') ) , PARTITION p_2011_10 VALUES LESS THAN ( UNIX_TIMESTAMP('2011-11-01 00:00:00') ) , PARTITION p_2011_11 VALUES LESS THAN ( UNIX_TIMESTAMP('2011-12-01 00:00:00') ) , PARTITION p_2011_12 VALUES LESS THAN ( UNIX_TIMESTAMP('2012-01-01 00:00:00') ) , PARTITION p_max VALUES LESS THAN (MAXVALUE) );

and inserted some rows in each partition:

INSERT INTO ptn_test VALUES (NULL, 'Bla', '2010-12-01 00:00:42') , (NULL, 'Bla', '2011-01-01 00:00:42') , (NULL, 'Bla', '2011-02-01 00:00:42') , (NULL, 'Bla', '2011-03-01 00:00:42') , (NULL, 'Bla', '2011-04-01 00:00:42') , (NULL, 'Bla', '2011-05-01 00:00:42') , (NULL, 'Bla', '2011-06-01 00:00:42') , (NULL, 'Bla', '2011-07-01 00:00:42') , (NULL, 'Bla', '2011-08-01 00:00:42') , (NULL, 'Bla', '2011-09-01 00:00:42') , (NULL, 'Bla', '2011-10-01 00:00:42') , (NULL, 'Bla', '2011-11-01 00:00:42') , (NULL, 'Bla', '2011-12-01 00:00:42') , (NULL, 'Bla', '2012-01-01 00:00:42');

Just for running a simple EXPLAIN:

EXPLAIN PARTITIONS SELECT * FROM ptn_test WHERE ts = '2011-11-01 00:00:42'; +----+-------------+----------+------------+--------+---------------+------+---------+------+------+-------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------+------------+--------+---------------+------+---------+------+------+-------+ | 1 | SIMPLE | ptn_test | p_2011_11 | system | ts | NULL | NULL | NULL | 1 | | +----+-------------+----------+------------+--------+---------------+------+---------+------+------+-------+

MySQL already opens 28 file descriptors and uses one table_open_cache and one table_definition_cache entry. When doing the same query in a second session on a second schema MySQL already has opened 56 file descriptors and 2 table_open_cache and 2 table_definition_cache entries. What the MySQL manual states about you can find here: How MySQL Opens and Closes Tables.

SHOW GLOBAL STATUS LIKE 'open_%s'; +--------------------------+-------+ | Variable_name | Value | +--------------------------+-------+ | Open_files | 56 | | Open_table_definitions | 2 | | Open_tables | 2 | +--------------------------+-------+

By the way we found that the value of Open_files is pretty close to the result of the following Linux command:

lsof -p <pid> | grep '/var/lib/mysql' | grep -e 'MYD' -e 'MYI' -e 'ib' | wc -l

So Open_files is a good indication of how far we are away of open_files_limit. MySQL needs some more file descriptors for error log file, binary log file etc. But this is typically less than 10.

Now we have already found why we got the Too many open files error message.

But it does not explain yet why our MyISAM key buffer was so badly used.

MyISAM key buffer is wiped out

I have not found it mentioned in the documentation explicitly (please correct me when I am wrong) but it looks like when MySQL has a pressure on the table_open_cache and removes entries from it, it wipes out blocks from the MyISAM key buffer of those tables as well.

A little experiment for this:

  • Clean up table_open_cache and table_definition_cache and MyISAM key buffer:
FLUSH TABLES; FLUSH STATUS; SET GLOBAL key_buffer_size=1024*1024; SHOW GLOBAL STATUS LIKE 'open_%s'; +--------------------------+-------+ | Variable_name | Value | +--------------------------+-------+ | Open_files | 0 | | Open_table_definitions | 0 | | Open_tables | 0 | | Opened_files | 0 | | Opened_table_definitions | 0 | | Opened_tables | 0 | +--------------------------+-------+ SHOW GLOBAL STATUS LIKE 'key_blocks_un%'; +-------------------+-------+ | Variable_name | Value | +-------------------+-------+ | Key_blocks_unused | 837 | +-------------------+-------+
  • Run a query from 2 sessions in 2 different schemata:
EXPLAIN SELECT COUNT(*) FROM (SELECT * FROM ptn_test WHERE ts >= '2011-11-01 00:00:00' and Ts < '2011-12-01 00:00:00' ) AS x;
  • Show the status information again:
SHOW GLOBAL STATUS LIKE 'open_%s'; +--------------------------+-------+ | Variable_name | Value | +--------------------------+-------+ | Open_files | 56 | | Open_table_definitions | 2 | | Open_tables | 2 | | Opened_files | 62 | | Opened_table_definitions | 2 | | Opened_tables | 2 | +--------------------------+-------+ SHOW GLOBAL STATUS LIKE 'key_blocks_un%'; +-------------------+-------+ | Variable_name | Value | +-------------------+-------+ | Key_blocks_unused | 504 | +-------------------+-------+

We have 56 file descriptors (2 tables x 14 partitions x (data + index = 2)), 2 table_open_cache and table_definition_cache entries and 333 MyISAM key buffer blocks used.

  • Connect from a 3rd connection without the -A option:
SHOW GLOBAL STATUS LIKE 'open_%s'; +--------------------------+-------+ | Variable_name | Value | +--------------------------+-------+ | Open_files | 30 | | Open_table_definitions | 21 | | Open_tables | 16 | | Opened_files | 127 | | Opened_table_definitions | 21 | | Opened_tables | 21 | +--------------------------+-------+ SHOW GLOBAL STATUS LIKE 'key_blocks_un%'; +-------------------+-------+ | Variable_name | Value | +-------------------+-------+ | Key_blocks_unused | 669 | +-------------------+-------+

26 file descriptors where released. We currently have 16 entries in the table_open_cache (its size was limited to 16) but in total 19 tables were opened. Why 65 files were opened for this operation is unclear.
The current amount of MyISAM key blocks is 168 which is roughly the half of the value before. So we can assume that one of our 2 partitioned table was closed and its key buffer entries were wiped out.

  • Run the query again
EXPLAIN SELECT COUNT(*) FROM (SELECT * FROM ptn_test WHERE ts >= '2011-11-01 00:00:00' and Ts < '2011-12-01 00:00:00' ) AS x; SHOW GLOBAL STATUS LIKE 'open_%s'; +--------------------------+-------+ | Variable_name | Value | +--------------------------+-------+ | Open_files | 58 | | Open_table_definitions | 21 | | Open_tables | 16 | | Opened_files | 156 | | Opened_table_definitions | 21 | | Opened_tables | 22 | +--------------------------+-------+ SHOW GLOBAL STATUS LIKE 'key_blocks_un%'; +-------------------+-------+ | Variable_name | Value | +-------------------+-------+ | Key_blocks_unused | 504 | +-------------------+-------+

28 file descriptors were used again with 29 in total and one more table was opened. And our key buffer is back to 333 blocks used.

So it looks like when we have really pressure on the table_open_cache this also has an impact on the MyISAM key buffer.

How does it behave with InnoDB?

Nowadays InnoDB is much more often used than MyISAM. So let us have a look how is the impact on InnoDB tables:

  • Starting point values:
SHOW GLOBAL STATUS LIKE 'open_%s'; SHOW GLOBAL STATUS LIKE 'innodb_buffer_pool_pages_%'; +--------------------------+-------+ | Variable_name | Value | +--------------------------+-------+ | Open_files | 0 | | Open_table_definitions | 0 | | Open_tables | 0 | | Opened_files | 0 | | Opened_table_definitions | 0 | | Opened_tables | 0 | +--------------------------+-------+ +----------------------------------+-------+ | Variable_name | Value | +----------------------------------+-------+ | Innodb_buffer_pool_pages_data | 1012 | | Innodb_buffer_pool_pages_free | 1002 | | Innodb_buffer_pool_pages_misc | 34 | | Innodb_buffer_pool_pages_total | 2048 | +----------------------------------+-------+
  • Run the query in 2 different connections on 2 different tables:
SHOW GLOBAL STATUS LIKE 'open_%s'; SHOW GLOBAL STATUS LIKE 'innodb_buffer_pool_pages_%'; +--------------------------+-------+ | Variable_name | Value | +--------------------------+-------+ | Open_files | 0 | | Open_table_definitions | 2 | | Open_tables | 2 | | Opened_files | 6 | | Opened_table_definitions | 2 | | Opened_tables | 2 | +--------------------------+-------+ +----------------------------------+-------+ | Variable_name | Value | +----------------------------------+-------+ | Innodb_buffer_pool_pages_data | 1012 | | Innodb_buffer_pool_pages_free | 1002 | | Innodb_buffer_pool_pages_misc | 34 | | Innodb_buffer_pool_pages_total | 2048 | +----------------------------------+-------+
  • Use a new connection:
SHOW GLOBAL STATUS LIKE 'open_%s'; SHOW GLOBAL STATUS LIKE 'innodb_buffer_pool_pages_%'; +--------------------------+-------+ | Variable_name | Value | +--------------------------+-------+ | Open_files | 2 | | Open_table_definitions | 21 | | Open_tables | 16 | | Opened_files | 71 | | Opened_table_definitions | 21 | | Opened_tables | 21 | +--------------------------+-------+ +----------------------------------+-------+ | Variable_name | Value | +----------------------------------+-------+ | Innodb_buffer_pool_pages_data | 1012 | | Innodb_buffer_pool_pages_free | 1002 | | Innodb_buffer_pool_pages_misc | 34 | | Innodb_buffer_pool_pages_total | 2048 | +----------------------------------+-------+

So the phenomena of wiping out data seems definitely not to happen with InnoDB tables. Further InnoDB seems to use much less file descriptors than MyISAM.

Conclusion
  • MyISAM uses a huge number of file descriptors. This comes especially true when using partitions.
  • This needs a significant increase of open_files_limit. Its impact is unknown.
  • Shortage of entries in the table_open_cache leads to wipe out tables from the table_open_cache.
  • This leads to wipe out of MyISAM key buffer blocks of the according tables which leads to slower queries when the table is requested next time.
  • With InnoDB this behavior is much more relaxed and problems should appear much later than with MyISAM.
  • Be careful using a significant number of (partitioned) tables with MyISAM. It can have a serious impact on the performance of your system.

Building Galera Replication from Scratch

Shinguz - Sun, 2011-11-13 12:29
Introduction

MySQL/Galera synchronous Multi-Master Replication consists of 2 parts:

If you do not want to download the prepared binaries you can build it on you own.

First you have to download the native MySQL sources, then patch it with the Galera wsrep patches and compile it. In a second step you have to build the Galera Plugin.

This is especially useful because in the standard Galera binary tar balls the garbd (and possibly other tools) is not provided.

The following steps describe how to do it:

Prepare a patched MySQL Download MySQL Sources

Download the normal MySQL source code:

wget http://mirror.switch.ch/ftp/mirror/mysql/Downloads/MySQL-5.5/mysql-5.5.15.tar.gzDownload wsrep Patch

Download the wsrep Patch for MySQL:

wget http://launchpad.net/codership-mysql/5.5/5.5.15-21.2/+download/mysql-5.5.15-wsrep_21.2.patchPatch MySQL

Patch MySQL as follows:

cd /tmp tar xf /download/mysql-5.5.15.tar.gz cd mysql-5.5.15 patch -p1 < /download/mysql-5.5.15-wsrep_21.2.patch

If you want avoid this step you can also download the already patched codership-mysql directly as follows:

bzr branch lp:codership-mysql

or

bzr branch lp:codership-mysql/5.5

If you want to do create your own wsrep patch:

bzr branch lp:codership-mysql/5.5 cd 5.5 bzr diff -p1 -v --diff-options " --exclude=.bzrignore " -r tag:mysql-5.5.15..branch:lp:codership-mysql/5.5 \ > mysql-5.5.15-wsrep_21.2.patchCompile MySQL with the wsrep patch

To compile the patched MySQL do the following:

chmod a+x BUILD/compile-amd64-wsrep BUILD/compile-amd64-wsrep --prefix=/home/mysql/product/mysql-5.5.15-wsrep-21.2 make install

Up to here this was the first step to get a prepared MySQL working with wsrep. Now we have to make the Galera Plugin...

Prepeare the Galera Replication Plugin Download Galera Replication Plugin

The source of the Galera Replication Plugin you can get like this:

wget http://launchpad.net/galera/1.x/21.1.0/+download/galera-21.1.0-src.tar.gz

or if you want to get the most recent source take if directly from launchpad:

bzr branch lp:galera/1.xCompile the Galera Replication Plugin cd /tmp/ tar xf /download/galera-21.1.0-src.tar.gz cd galera-21.1.0-src/ scons cp garb/garbd /home/mysql/product/mysql-5.5.15-wsrep-21.2/bin/ cp libgalera_smm.so /home/mysql/product/mysql-5.5.15-wsrep-21.2/lib/plugin/

This is the whole magic. The most difficult thing is to put everything in the right order and to get the right packages in place (for Ubuntu/Debian: libboost-dev, libboost-program-options-dev (>= v1.41), scons, libssl-dev, check, g++ (>= 4.4.0), etc.).

Syndicate content