MySQL Tech-Feed (en)
Daniel Nichter: Encore: What Drizzle needs
Error injection tests for InnoDB would be nice
As I wrote in a previous blog post, it is time to add error injection tests to InnoDB.
MySQL Performance Blog: MySQL Configuration Wizard Updated
We’ve released an updated version of the MySQL Configuration Wizard we announced at the end of last year. If you don’t remember that announcement, here’s the short version: this is a tool to help you generate my.cnf files based on your server’s hardware and other characteristics.
We’ve gotten really good feedback on this tool, including this nice mention on Stack Exchange:
Percona just built a tool to do just that called the Configuration Wizard. I tested it out once just to see what it would return and the results were pretty darn close to what we were using on our servers, whose cnf’s were put together by highly trained mysql certified dba’s.
So what’s changed in the new version of the Configuration Wizard? Quite a few things. We’ve rolled out the first iteration of the account and profile features. Now you get a homepage with your configuration files, so you can manage them and return to them anytime you like.
From this page (click on the image for a fullsize view) you can do things like sharing configuration files and emailing them to yourself. The new release also adds features like downloading the configuration files so you don’t have to copy-paste them.
If you share a configuration file, then the URL can be loaded by anyone, even if they’re not logged in. It’s kind of like sending someone a link to a pastebin or something like that. Screenshot:
Another new feature is something I’ve wanted for a long time: the ability to generate a more strict, safer configuration file. There’s a new page in the Wizard that lets you set a lot of sanity/safety options to prevent common problems MySQL users run into because of too-permissive MySQL behaviors. These are the kinds of things that Drizzle fixes — and should be fixed by default in MySQL — but never will be because they might break applications that rely on the default behaviors. If you’re building an application from the ground up, now you can prevent bad things from getting a nose under the tent. Here’s a screenshot:
In addition to these things, we have added a number of other features you might not notice, which I won’t spend much time on — they’re things like an integrated feedback form at the left of the page and so on.
What’s next? Well, next I think we’re going to turn our attention to adding new tools, rather than improving this one. I have a list of tools that people have requested or suggested: a SQL formatter, a visual EXPLAIN tool, a configuration advisor, a query analysis tool, a way to register a server’s essential characteristics and then get advice when there’s a new release that might be beneficial for you, and so on. I have selected the next priorities, but I don’t want to spoil the surprise or promise something if it turns out to be harder than I think it will be. What ideas do you have? Let me know by leaving your feedback in the comments.
We hope this suite of free browser-based tools helps you become a more productive MySQL user and administrator!
MySQL Performance Blog: Schedule for MySQL Conference 2012 is Published
I am pleased to announce the schedule for Percona Live: MySQL Conference And Expo 2012 is now published. This is truly great selection of talks with something for MySQL Developers, DBAs, Managers, people just starting to use MySQL as well as looking for advanced topics. We have talks about running MySQL on extremely large scale in a Web as well as running MySQL In the Enterprise Environments. Some speakers have spent over decade pushing MySQL to its limits, others have in depth experience working on MySQL Code.
We have many talks which are covering Oracle MySQL, and forks such as MariaDB, Drizzle and Percona Server are well covered too. You will also have a chance to learn about commercial MySQL alternatives such as Clustrix and SchoonerSQL from our sponsors.
At the same time this is the conference for MySQL Community. We’re talking about other database systems only as it comes to migration to MySQL and about NoSQL technologies such as Memcached,Redis,Sphinx which are commonly used to supplement MySQL.
The space was very tight this year and competition was very tough. We had over 300 proposals for approximately 60 slots. As results committee had to make a lot of very tough choices and many great talks could not be accommodated.
We have a great Conference Committee this year who has done a great job getting the schedule together. I can’t thank them enough !
See you in April in Santa Clara for MySQL Conference and lets make this event an amazing success !
MySQL Performance Blog: linux.conf.au 2012 roundup
I spent last week at linux.conf.au in Ballarat, Victoria (that’s the Victoria in Australia, not wherever else there may be one) which is only a pleasant two hour drive from my home town of Melbourne (Australia, not Florida). I sent an email internally to our experts detailing bits of the conference that may interest them – and I thought that it may also interest our wider readers who are interested in all levels of the software stack.
For those that don’t know: linux.conf.au is one of (if not the) most awesome technical conference in the free software space. It consistently attracts a wide variety of very knowledgable speakers and a large number of attendees.
Every year it is put together by a (different) set of volunteers, and this means it also tours around the country (and sometimes even New Zealand). This year it was in Ballarat – a regional city a couple of hours drive out of Melbourne. One of the great things about LCA is that you are not always at the same hotel, in the same city stuck with the same two restaurants.
This year had a bit of an increased focus on privacy, security and basic freedoms and human rights. This is no doubt a reaction to the increased attacks on freedom of speech and the internet that have been going on in recent months.
That being said, there were a huge number of great talks on a variety of topics – everything from filesystem performance to open hardware, to repurposing existing hardware to upcoming challenges for the kernel to howto be a better sysadmin. In fact… for those who weren’t there and spend any of their life helping people admin machines – go and watch those talks.
linux.conf.au (for me) is one of the cannot-miss events in the year. It’s an opportunity to learn things that directly apply to my work, may apply in the future and most certainly will never apply but are rather cool anyway.
All the video from the conference are already up! This is an amazing effort from the (volunteer) AV team. I’ve included links to a selection of talks below that I especially think are worth watching:
Watch no matter what:
- Keynote – Karen Sandler
This keynote was amazing. Go watch it. The organisers did a truly excellent job at selecting keynotes this year. - Keynote – Jacob Appelbaum
This is best described as a tour of internet freedom, the attacks on it and a tour of the modern surveillance state. - UEFI and Linux: The future is here and it’s awful
You will be depressed at some point in this talk – the news is not great for the future of even being able to easily boot free software on machines. - Paul Fenwick’s Keynote
A good quick introduction to hacking your brain. I’m sure many of you (like me) are interested in ways to hack our brains and our bodies to better serve us. This talk is merely an introduction. I also suggest you check out Anki if you want to improve your ability to remember things. - Torturing OpenSSL
This was certainly one of the most amazing talks I saw. A whole new interesting way of attacking SSL. Vary CPU voltage, extract private SSL keys! Wheee! - The Kernel Report – Jonathan Corbet
You can skip this only if you read every single mail on LKML, run your own analysis on the kernel source tree and publish an (at least) weekly publication on Linux.
This is one of the few (err… only) talks that is repeatedly accepted into linux.conf.au. Why? Because Jon manages to compress a whole year of activity in the Linux world int oa single session that is incredibly informative. - I Can’t Believe This Is Butter! A tour of BTRFS
This is going to be the default filesystem in a number of places over the next few years, time to start learning! While it’s unlikely to be suitable for database workloads anywhere in the near future, I suspect we’ll see BTRFS as the root filesystem and XFS as the filesystem for the database server in the not too distant future. - Mistakes Were Made
This session explores a number of rather indispensable things for those in operations – but also leaks over into development. Learning from our mistakes can only make us better at doing our jobs. - Hack Everything: repurposing everyday devices
This was great fun – reuse existing hardware to do awesome things!
Talks that could be quite interesting for you, depending on your interests:
- XFS: Recent and Future Adventures
This talk covered a lot of the recent improvements in XFS. Especially interesting for those with metadata heavy workloads. - Bloat: How and Why UNIX Grew Up (and Out) – Rusty Russell,Matt Evans
This is a pretty neat examination of where bloat comes from taking V6 UNIX utilities as the baseline. It would be interesting to have a similar study on the database servers we all love and use. - Migrating to PHP 5.4
Probably quite useful if you’re dealing with PHP (although I wasn’t in this session). - Finding vulnerabilities in PHP code (via static code analysis)
This one is for our web developers. It’s only a 20minute talk, so you should really all go and watch it. Various static analysis tools are part of the normal toolkit for C and C++ development these days, it’s great to see people working on them for other languages.
Talks I shall be watching the videos of as I was in another talk at the time:
- Smashing a square peg into a round hole
- Codec2: Open Source Speech Codec
- Desktop Home Hacks
- This Old Code, or Renovating Dusty Old Open Source For Fun and Profit
This is especially interesting as I both know Greg and have talked to him about this work *and* because of my experience with the Drizzle project – we certainly have encountered more than our fair share of dusty old code during our time working on Drizzle. - Mentoring: we’re Doing It Wrong
My Talk:
- Multi-tenancy, multi-master, sharding, scaling and analytics with Drizzle
An update as to what we’ve been working on getting going in Drizzle.
All the videos are going up at:
Henrik Ingo: Making rpm builds a first class citizen: How?
In my previous post I explained why I believe the production of RPM and DEB packages should be more integrated with the rest of your development process. Now it's time to look into how you can put the RPM build scripts inside your main source code repository, and in particular how I did that to produce RPM packages for Drizzle.
Henrik Ingo: Making rpm builds a first class citizen: Why?
Last weekend I released rpm files for the latest Drizzle Fremont beta (announcement). As part of that work I've also integrated the spec file and other files used by the rpmbuild into the main Drizzle bzr repository (but not yet merged into trunk). In this post I want to explain why I think this is a good thing, and in a follow up post I'll go into what I needed to do to make it work.
(And speaking of stuff you can download, phpMyAdmin 3.5.0-alpha1 now supports Drizzle!)
Official Drizzle Blog: Release: Drizzle 2012.01.30 (Fremont beta3) RPM binaries for RHEL / CentOS 6
Hot on the heels of this week's Drizzle 2012.01.30 source release, we are now also releasing beta quality binary RPMs for your downloading and testing pleasure! While we are revitalizing the Drizzle yum repository, you can download the rpm-bundle from our Launchpad project page: https://launchpad.net/pkg-drizzle/+download
As the Fremont development cycle is ending it's beta cycle and heading for release candidates, we are starting to publish also binary packages for downloads. Today we are releasing RPM packages for RHEL/CentOS 6.0. Fedora 16 packaging has unfortunately been postponed. We will eventually support also Fedora 16 RPMs, but in the mean time you can possibly spot some Drizzle developers on twitter or irc making snarky comments about systemd ...
In the following week(s) we will also publish Debian and Ubuntu packages.
Note that these packages are a first release of their kind and still a bit experimental. What this means is that the underlying Drizzle release itself is stabilizing quite nicely, but the packaging process is currently not completely integrated nor automated.
Please report errors at the main Drizzle bug tracker, and direct your general feedback to the drizzle-discuss mailing list.
Details:
- These packages were built from the Drizzle bzr trunk at tag 2012.01.30.
- In addition a branch that supports "make rpm" was merged from lp:~hingo/drizzle/drizzle-integrate-packaging-rpm. This is based on the same RPM spec as was used for Drizzle 7, but the intent is to integrate it with the main Drizzle source code and then Jenkins builds. The current release was however still a manual process.
- When building this way, an additional bzr tag --force 2012.01.30 was needed to maintain the same version number.
- Along the RPMs we have uploaded a source tar file fake-drizzle7-2012.01.30.tar.gz. This corresponds to the real Drizzle 2012.01.30 release, plus the above changes.
- The RPM files are made available as a tar bundle. The process to publish into the Drizzle yum repository is currently broken but will resume in the near future.
- A related issue is that the packages are not signed.
- Known issue: If you install the plugin rpms, they will be automatically loaded at startup. However, some of them (in particular, many auth plugins, the policy plugins and possibly the slave plugin) don't actually work without further configuration and will thus prevent the server from starting. If you encounter this plugin, you should either complete the configuration (in /etc/drizzle/conf.d/[plugin].conf) or, if you don't want to use it, uninstall the plugin.
Official Drizzle Blog: Drizzle source tarball 2012.01.30 (Fremont beta3) has been released
Drizzle source tarball 2012.01.30 has been released. This is the third beta in our Fremont series.
In this release:
* continuing refactoring, restructuring, and code quality improvements
* many more documentation improvements
* additional bugs fixed
* improvements in test suite
The Drizzle download file can be found here.
Mark Atwood: On my way to LCA2012
I'm starting the first leg of a literal `round-the-world business trip.
I'm flying from my home in Seattle, via LAX, to Melbourne Australia. There I will meet up with my friend Stewart Smith, the Director of Engineering at Percona. He is fellow survivor of MySQL & Sun, and a fellow contributor to Drizzle.
Other good friends of mine who are converging on Australia for LCA 2012 are Sarah Novotny, Monty Tayler, and Jacob Applebaum.
Why am I going to Australia? Geeks into open source who are "in the know" know the Linux.conf.au conference to be one of the best open source conferences in the world. This year it will be held in Ballarat, which is not far from Melbourne. This will be my 4th LCA, having attended past ones in Brisbane, Wellington, and Tazmania
At that conference, I will be speaking in the SysAdmin Miniconf, to demo OpenShift, Red Hat's cloud PaaS. (Sign up with the promo code LCA2012.)
This is only the first leg of this trip. After LCA, I will be heading to Bangalore India to present at JUDCon:India.
And after that, I will keep heading west to Europe, to present at FOSDEM in Brussels.
And then who knows where I will go next?
Who wants to write a storage engine?
LevelDB might be a great fit for MongoDB. MongoDB doesn't need multi-statement transactions. Both are limited by 1-writer or N-reader concurrency, but writes to database files are much faster with LevelDB because it doesn't do update in place. So LevelDB doesn't lose as much performance for IO-bound workloads by doing 1-writer or N-readers and my guess is that this could make MongoDB much better at supporting IO-bound workloads.
David Shrewsbury: 2011 Year In Review
- With the help of Patrick Crews and Andrew Hutchings, and building on the work of Jay Pipes, I managed to implement a pretty solid replication solution in Drizzle and made the slave plugin available.
- Implemented multi-source replication (more accurate name than multi-master) in Drizzle.
- I ventured into new territory by giving my first-ever presentation at the MySQL User Conference in Santa Clara, CA.
- Rackspace made life interesting and forced me find another position. I joined a local startup company as their first real hire. Two of the founders are former coworkers and current friends of mine, and the timing worked out well for all parties.
- For my new job, I had to learn Python, GIT, and how to work with AWS systems. Python rocks. I feel like I can fly now.
- Lost 20 lbs! Woohoo!
I don't know what 2012 has in store for me, but hopefully it is just as exciting. I hope everyone has a Happy New Year.
Patrick Crews: dbqp being renamed
One of the best things that can happen to a piece of software is for people to actually use it.
I’ve been fortunate enough to have received feedback on the tool from several members of both the Percona and Drizzle teams. The most common and strongly emphasized comments were in regards to what a terrible, terrible name dbqp really is in terms of saying, seeing, and typing it ; )
As that isn’t something that can be disputed (it’s really annoying to use in conversations *and* to type several dozen times a day), the project has been renamed to kewpie. For those that follow such things, I did present on another tool with that name at the last MySQL Conference, but *that* tool is a nice-to-have, while the test-runner sees daily use. Better to save the good names for software that actually stands a chance of being used, I say : )
While there are probably 1*10^6 other things I need to do (Stewart is a merciless slave driver as a boss, btw…heheh), the fact that we are merging the tool into the various Percona branches meant it should be done sooner rather than later. The tool is currently in our 5.1 branch and I have merge requests up for both Drizzle and Xtrabackup (dbqp was living there too).
I have several other interesting things going on with the tests and tool, which I’ll be blogging about over at MySQL Performance Blog. Later this week, I’ll be talking about what we’ve been doing to work on this bug ; )
Also, the Percona Live MySQL Conference in DC is just around the corner. There are going to be some great speakers and attendees
Recover lost .frm files for InnoDB tables
Recently I found in a forum the following request for help:
My MySQL instance crashed because of free disk space fault. I saw in /var/lib/mysql all the files: ibdata1, ib_logfile* and all the folders containing frm files. Well, when i solved the problem and run successfully the instance, some databases disappeared. One of those is the most important, and i don't know how many tables had and their structures. Is there any way for recover the entire lost database (structure and data) only having the ibdata1 file?
First of all the observation sounds a bit strange because files do not just disappear. So I fear that its not just the .frm files which are lost. But let's think positive and assume just the .frm files have gone...
To recover the tables is a bit tricky because the .frm files contains the information about the table structure for MySQL.
If you have any old backup or only a structure dump it would be very helpful..
In InnoDB there is the table structure stored as well. You can get it out with the InnoDB Table Monitor as follows:
mysql> CREATE SCHEMA recovery; mysql> use recovery; mysql> CREATE TABLE innodb_table_monitor (id INT) ENGINE = InnoDB;MySQL will write the output into its error log:
TABLE: name test/test, id 16, flags 1, columns 4, indexes 1, appr.rows 3 COLUMNS: id: DATA_INT DATA_BINARY_TYPE len 4; DB_ROW_ID: DATA_SYS prtype 256 len 6; DB_TRX_ID: DATA_SYS prtype 257 len 6; DB_ROLL_PTR: DATA_SYS prtype 258 len 7; INDEX: name GEN_CLUST_INDEX, id 18, fields 0/4, uniq 1, type 1 root page 312, appr.key vals 3, leaf pages 1, size pages 1 FIELDS: DB_ROW_ID DB_TRX_ID DB_ROLL_PTR idWith these information and some experience you can guestimate the original table structure:
Schema and table name: test.test
id: DATA_INT DATA_BINARY_TYPE len 4; DB_ROW_ID: DATA_SYS prtype 256 len 6; DB_TRX_ID: DATA_SYS prtype 257 len 6; DB_ROLL_PTR: DATA_SYS prtype 258 len 7;The table has only 1 column called id which is an 4 byte int, the other columns are InnoDB internal stuff (19 byte!).
INDEX: name GEN_CLUST_INDEX, id 18, fields 0/4, uniq 1, type 1The table has only one generated clustered index (no explicit index!).
So we can guess:
mysql> CREATE TABLE test.test ( id INT UNSIGNED NOT NULL DEFAULT 0 ) ENGINE = InnoDB CHARSET=utf8;This table has to be created on a second system now. From there we see with the InnoDB table monitor:
TABLE: name test/test, id 0 1269, columns 4, indexes 1, appr.rows 0 COLUMNS: id: DATA_INT DATA_UNSIGNED DATA_BINARY_TYPE DATA_NOT_NULL len 4; DB_ROW_ID: DATA_SYS prtype 256 len 6; DB_TRX_ID: DATA_SYS prtype 257 len 6; DB_ROLL_PTR: DATA_SYS prtype 258 len 7; INDEX: name GEN_CLUST_INDEX, id 0 909, fields 0/4, uniq 1, type 1 root page 3, appr.key vals 0, leaf pages 1, size pages 1 FIELDS: DB_ROW_ID DB_TRX_ID DB_ROLL_PTR idThis is not 100% correct yet.
id seems to be SIGNED and not UNSIGNED and NULL seems to be allowed. So next try:
mysql> CREATE TABLE test.test ( id INT SIGNED NULL ) ENGINE = InnoDB CHARSET=utf8;TABLE: name test/test, id 0 1271, columns 4, indexes 1, appr.rows 0 COLUMNS: id: DATA_INT DATA_BINARY_TYPE len 4; DB_ROW_ID: DATA_SYS prtype 256 len 6; DB_TRX_ID: DATA_SYS prtype 257 len 6; DB_ROLL_PTR: DATA_SYS prtype 258 len 7; INDEX: name GEN_CLUST_INDEX, id 0 911, fields 0/4, uniq 1, type 1 root page 3, appr.key vals 0, leaf pages 1, size pages 1 FIELDS: DB_ROW_ID DB_TRX_ID DB_ROLL_PTR idSo this looks pretty much like it should. Do not be confused because of some other details. The original table was created on a MySQL 5.6.4 and the .frm recovery is done on a 5.1.55.
Now copy the .frm file to the original database and look if you can access your data. If it does you can do this table by table for all you zillions of tables...
When you are done. Take a backup and ideally do a proper install of your database!
Just a little detail: I created the original table like this:
mysql> CREATE TABLE test.test (id INT) ENGINE = InnoDB; mysql> SHOW CREATE TABLE test.test\G CREATE TABLE `test` ( `id` int(11) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1MySQL seems to figure out itself what is the correct character set...
Rolling upgrade of Galera 1.0 to 1.1
A few days ago Codership announced their ">new version Galera v1.1 - synchronous Replication Cluster for MySQL. Before we look at the new feature of Rolling Online Schema Upgrade (OSU) we have a look at how to upgrade to the new Galera release.
A rolling upgrade of your synchronous Galera Replication Cluster from version 1.0 to 1.1 is quite easy when you stay at the same MySQL version (5.5).
To not lose the availability of your database service during the upgrade you should have at least 3 Galera nodes in your Cluster.
For further details please also look at MySQL/Galera cluster upgrade.
Hint: If you can do without rolling upgrade, you better avoid it and take your Galera Cluster down.
Check the versionCheck first the version you are currently running on:
SHOW GLOBAL VARIABLES LIKE 'version'; +---------------+-----------------------+ | Variable_name | Value | +---------------+-----------------------+ | version | 5.5.15-wsrep_21.1-log | +---------------+-----------------------+We can see that we are using MySQL 5.5.15 with the wsrep API Version 21 and the wsrep patch 21.1.
SHOW GLOBAL STATUS LIKE 'wsrep_provider_version'; +------------------------+-------------+ | Variable_name | Value | +------------------------+-------------+ | wsrep_provider_version | 21.1.0(r86) | +------------------------+-------------+Here we see that we are using the Galera Replicator (plug-in) 1.0 (revision 86) based on the wsrep API Version 21.
Some rules to Galera versioning. We have 4 different version numbers we should care about:
- MySQL version (5.5.15)
- Wsrep API version (21, 22, ...)
The wsrep API versions will always be single monotonically increasing numbers: 21, 22, ... That indicates API compatibility between MySQL and Galera. - Wsrep Patch version (21.1)
The wsrep patch versions has the form 21.1 where the 21 represents the API version and the 1 represents bug-fix of that API version. - Galera Replicator (= provider, plug-in) (1.0(r86))
Galera versions will be in the form <major>.<minor> with minor meaning: bug-fixes and small features and major: major features, which involve a lot of code change.
Galera 22.1.1 is backward compatible with Galera 21.1.0. I was told that Galera should be at least ONE version backward compatible. So 1.0 should be for 0.8 and 1.1 for 1.0 and 1.2 for 1.1 and 2.0 for 1.2 etc.
PreparationDownload the packages for your preferred installation method from here:
In my case there was only a binary tar ball provided for Codership-MySQL but not for the Galera Plug-in v1.1. So I extracted it from the Debian package as follows:
dpkg-deb -x galera-22.1.1-amd64.deb /tmp/oli/ cp /tmp/oli/usr/bin/garbd /home/mysql/product/mysql-5.5.17-wsrep-22.3/bin cp /tmp/oli/usr/lib/galera/libgalera_smm.so /home/mysql/product/mysql-5.5.17-wsrep-22.3/lib/plugin/For RPM's it should work in a similar way:
rpm2cpio package.rpm | cpio -idmvPrecautionsMake sure, that during upgrade from 5.1 to 5.5 no DDL's are allowed!
UpgradeThen upgrade your Galera Cluster as follows:
- Shift load away from this node.
- Shutdown node (/etc/init.d/mysql stop)
- Uninstall or remove the old Galera plug-in.
- Uninstall or remove the old Codership-MySQL Binaries
- Install the new Codership-MySQL binaries with the wsrep API version 22
- Install the new Galera plug-in v1.1
- Check if wsrep_provider in my.cnf is pointing to the correct new location.
- Start node (/etc/init.d/mysql start)
- Check if node came up properly:
SHOW GLOBAL STATUS LIKE 'wsrep%'; +----------------------------+--------------------------------------+ | Variable_name | Value | +----------------------------+--------------------------------------+ | wsrep_local_state | 4 | | wsrep_local_state_comment | Synced (6) | | wsrep_cluster_size | 3 | | wsrep_cluster_status | Primary | | wsrep_connected | ON | | wsrep_local_index | 1 | | wsrep_provider_version | 22.1.1(r95) | | wsrep_ready | ON | +----------------------------+--------------------------------------+ - If this is the case shift load back to this node.
If you have already troubles up to here we recommend to solve the problems first and NOT to continue with the upgrade procedure. Otherwise you risk the loss of you complete service. - If your reached this step you can upgrade the next node in your Galera Cluster.
When you have upgraded all your nodes in the Galera Cluster you should notice, that the Protocol version will switch automatically from 1 to 2:
111212 17:32:24 [Note] WSREP: Quorum results: version = 1, component = PRIMARY, conf_id = 6, ... 111212 17:34:33 [Note] WSREP: Quorum results: version = 2, component = PRIMARY, conf_id = 7,Configuration for Rolling restartIn Galera Cluster configurations you see often that the Cluster is still set to its initial start configuration which is inappropriate for a rolling restart operation:
Galera configuration for an initial Cluster startGalera node 1: wsrep_cluster_address: gcomm://
Galera node 2: wsrep_cluster_address: gcomm://192.168.1.101:4567
Galera node 3: wsrep_cluster_address: gcomm://192.168.1.102:4567
In this case Node 2 and 3 are OK for a rolling restart but Galera Node 1 will fail to restart.
Galera configuration for normal operations and a Cluster rolling restartThis is the way we recommend to have a Galera configuration for normal operations and a rolling restart:
Galera node 1: wsrep_cluster_address: gcomm://192.168.1.103:4567
Galera node 2: wsrep_cluster_address: gcomm://192.168.1.101:4567
Galera node 3: wsrep_cluster_address: gcomm://192.168.1.102:4567
Every Galera node points to its "left" neigbour.
Upgrade from MySQL 5.1/Galera 0.8 to MySQL 5.5/Galera 1.1Upgrading from MySQL 5.1/Galera 0.8 to MySQL 5.5/Galera 1.1 has to be done in 2 steps because Codership only provides backwards-compatibility for one minor version jumps (0.8 -> 1.0 -> 1.1 -> 1.2 -> 2.0).
We have 2 possibilities now:
5.1/0.8 -> 5.1/1.0 -> 5.5/1.1
or
5.1/0.8 -> 5.5/1.0 -> 5.5/1.1
Which one you choose is up to you.
A rolling upgrade on a running system is impossible without a snapshot state transfer (SST) at the moment. So be prepared it takes a while and causes some load on the systems.
In our case we chose the way via 5.5/1.0 (2nd way).
To upgrade from 5.1/0.8 to 5.5/1.0 proceed as follows:
- Shift load away from this node to the other 2 nodes.
- Shutdown this node (/etc/init.d/mysql stop)
- Set wsrep_provider = none in my.cnf
- Uninstall or remove the old Galera plug-in.
- Uninstall or remove the old Codership-MySQL Binaries
- Install the new Codership-MySQL binaries
- Install the new Galera plug-in
- Start this node (/etc/init.d/mysql start)
- Then you will get some error messages:
111214 11:44:59 [ERROR] Missing system table mysql.proxies_priv; please run mysql_upgrade to create it 111214 11:44:59 [ERROR] Native table 'performance_schema'.'events_waits_current' has the wrong structure ... 111214 11:44:59 [ERROR] Native table 'performance_schema'.'file_instances' has the wrong structure 111214 11:44:59 [Note] Event Scheduler: Loaded 0 events 111214 11:44:59 [Note] WSREP: wsrep_load(): loading provider library 'none' - Run mysql_upgrade (see MySQL upgrade instructions and consider that a MySQL binary upgrade is not officially supported/recommended (this is not a problem, because SST with mysqldump will do a logical restore anyway)).
- Set wsrep_provider in my.cnf to the new plug-in location.
- Prepare SST upgrade script on (all) the donor(s) node(s).
cp wsrep_sst_mysqldump wsrep_sst_mysqldump_upgrade - Change the script wsrep_sst_mysqldump_upgrade that it dumps all databases except the mysql database:
diff wsrep_sst_mysqldump wsrep_sst_mysqldump_upgrade 59c59 < --skip-comments --flush-privileges --all-databases" --- > --skip-comments --flush-privileges --databases test foodmart" - Caution: Be careful with Stored Procedures, Stored Functions, Triggers and Events! This upgrade procedure will NOT work completely if you use some of those MySQL features. This upgrade procedure will further not work completely if you have differences in the mysql schema of your Galera nodes (for what ever reason).
- Set wsrep_sst_method = mysqldump_upgrade in my.cnf
- Start this node (/etc/init.d/mysql start). Keep in mind that one of the remaining Galera nodes will act as a SST donor and during the synchronization he is not available for queries!
- Check if node came properly up: SHOW GLOBAL STATUS LIKE 'wsrep%';
- If this is the case shift load back to this node.
If you have already troubles up to here we recommend to solve the problems first and NOT to continue with the upgrade procedure. Otherwise you risk the loss of you complete service. - Set wsrep_sst_method back to its original value (mysqldump).
- If your reached this step you can upgrade the next node in your Galera Cluster.
If you finally manged to upgrade from MySQL 5.1/Galera 0.8 to MySQL 5.5/Galera 1.0 you can follow the procedure mentioned above to upgrade to Galera 1.1
FindingsTo identify our different Galera Clusters we name them:
wsrep_cluster_name = "Galera-0.8 wsrep-21"In this upgrade scenario this naming convention is very non-optimal because the name of our Galera Cluster should change as well. But the value of wsrep_cluster_name should be the same on all Galera nodes otherwise a node is not capable to join the Cluster (this is to make sure that a Galera node is not connecting by accident to a/the wrong Galera Cluster).
To change the wsrep_cluster_name parameter you have to bring down the whole Galera Cluster. This is not possible at the moment during a rolling restart.
Hopefully this constraint is released in a later Galera version.
Stewart Smith: Puppet snippet for setting up a machine to build Drizzle
You could use this in a Vagrant setup if you like (I’ve done so for testing).
Step 1) Set the following in your Vagrantfile:
Vagrant::Config.run do |config| config.vm.box = "lucid32" config.vm.box_url = "http://files.vagrantup.com/lucid32.box" config.vm.provision :puppet endStep 2) Get puppet-apt helper.
I used https://github.com/evolvingweb/puppet-apt and put it in a manifests/ directory like so:
$ mkdir manifests $ cd manifests $ git clone git://github.com/evolvingweb/puppet-apt.gitStep 3) Write your puppet manifest:
import "puppet-apt/manifests/init.pp" import "puppet-apt/manifests/ppa.pp" class drizzlebuild { apt::ppa { "ppa:drizzle-developers/ppa": } package { "drizzle-dev": ensure => latest, } } include drizzlebuildStep 4) “vagrant up” and you’re done! Feel free to build Drizzle inside this VM.
I’m sure there may be some more proper way to do it all, but that was a pretty neat first intro to me to Puppet and friends :)
Daniel Nichter: Relation of Drizzle modules and plugins
Marketing a bug in 3 easy steps
- File a request for crash recovery tests and wait a few months
- File a request for error injection tests during InnoDB DDL and wait a few months
- Lose a table during alter table because untested error handling is incorrect and blog about it
Migrating from MySQL Master-Master Replication to Galera Multi-Master Replication
Galera is a synchronous Multi-Master Replication for MySQL. It is therefore in competition with several other MySQL architectures:
- Master-Master Replication with MySQL
- MySQL Cluster
- The non-open source product called Schooner
Very often they can be easily replaced by Galera's synchronous Multi-Master Replication for MySQL.
All those products have some advantages and disadvantages. Very often MySQL Master-Master Replication is used in the field because of its simplicity to set-up. But after a while one faces its disadvantages which is mainly data inconsistency between the 2 Masters. This is not only the fault of MySQL Replication but MySQL Replication makes it easy to get such data inconsistencies.
In the following article we look at how you can replace a MySQL Master-Master Replication by Galera Multi-Master Replication with the possibility to fall back if you do not like the solution or if you run into troubles.
Starting pointSome MySQL users have a typical Master-Master Replication set-up like a) active-passive or b) active-active for High Availability (HA) reasons. Either in the same data center or even in remote data centers:
a) active-passive b) active-active Adding Galera synchronous ReplicationAs a first step you can add a Galera Replication Cluster as a simple Slave:
In this set-up you have to consider, that ALL nodes which are participating in replication (Master 1, Master 2 and Galera 1) have set the following parameters:
# # my.cnf # [mysqld] default_storage_engine = InnoDB log_slave_updates = 1 log_bin = bin-log server_id = <n> binlog_format = ROWIt is very important that the server_id is unique on all MySQL nodes BUT they have to be EQUAL for all Galera nodes. Example:
- Master 1: server_id = 1
- Master 2: server_id = 2
- Galera 1: server_id = 3
- Galera 2: server_id = 3
- Galera 3: server_id = 3
This is to avoid conflicts during replication.
Galera is set-up as described in Installing MySQL/Galera. Please make sure, that you do no have any MyISAM tables anymore. Galera cannot cope with any other Storage Engine than InnoDB at the moment.
The following query helps you to find out if you are using any other Storage Engine than InnoDB:
Then you do a normal dump from the Master 2 as follows:
mysqldump --user=root --password --master-data --single-transaction \ --databases foodmart test > full_dump.sqlWhen you dump the database avoid dumping the mysql schema otherwise you will destroy your Galera Node 2 and 3. Then restore the dump on ONE node of the Galera Cluster (preferably on node 1) after setting it to its master:
CHANGE MASTER TO master_host='master1', master_port=3306 , master_user='replication', master_password='secret'; mysql --user=root --password < full_dump.sqlThen you can attach the Galera node to the Master 2.
Now all data arriving to your MySQL Master(s) will be automatically replicated to the Galera Cluster as well.
Adding the Galera Cluster into the ringIn a second step you can add the Galera Cluster into the Replication ring by pointing Master 1 to the Galera Node 1:
Application Load Balancing for GaleraTo have true High Availability (HA) it makes sense to put some Load Balancer in front of your Galera Cluster. This can be done either through:
- a Hardware Load Balancer
- a Software Load Balancer (Linux Virtual Server (LVS), Galera Load Balancer, Pen
- MySQL Proxy, Connector/J or PHP Mysqlnd
- or mechanisms implemented in your application.
Now your Galera Replication Cluster is ready to put some load on it:
If you are more familiar with Galera you can move the Virtual IP (VIP) from MySQL Master-Master Replication to the Galera Replication Cluster:
And if you are happy with the synchronous replication and its scaling performance you can finally drop your old MySQL Master-Master set-up and bypass the VIP during next downtime of your application.
ShortcutsA shortcut in this way would be when you directly replace Master 2 by a Galera node:
Then you need one server less and you can directly use the MySQL Master node as a base for starting with Galera. You just have to replace the MySQL Binaries by the MySQL-Galera Binaries and then add 2 other Galera nodes in the set-up.
Important notesCurrently Galera works only with InnoDB tables. So you have to make sure that you convert all your non-InnoDB tables to InnoDB tables (except the ones in the mysql Schema). Otherwise you will run into problems.
The described set-up works starting with Galera v1.1 and wsrep v22.3.
The memory of the Galera node getting the import has grown by 1.5 Gbyte in one of our tests. So be prepared that the system has enough memory! In our first tests the system heavily started to swap, which caused high I/O load. In this situation Galera behaved erroneous...
If the Master is under very high load Galera Slave can not catch up with the load and starts lagging... This is not a problem if you run the load only on the Galera Cluster!
To avoid a Split-Brain situations all Cluster Solutions need at least 3 nodes. This is the same with Galera. When you move from MySQL Master-Master replication you need one Server more than before to avoid this problem. Theoretically Galera can be run in a 2-node set-up but this is strongly NOT recommended to do.
One way out of this situation is to use the garbd who acts as an arbitrator in such a scenario. This is called a 2 1/2 node set-up.
And now have fun with your synchronous Multi-Master Galera Replication for MySQL...
How MySQL behaves with many schemata, tables and partitions
Recently a customer claimed that his queries were slow some times and sometimes they were fast.
First idea: Flipping query execution plan caused by InnoDB could be skipped because it affected mainly MyISAM tables.
Second idea: Caching effects by either the file system cache caching MyISAM data or the MyISAM key buffer caching MyISAM indexes were examined: File system cache was huge and MyISAM key buffer was only used up to 25%.
I was a bit puzzled...
Then we checked the table_open_cache and the table_definition_cache which were still set to their default values. Their according status information Opened_tables and Opened_table_definitions were clearly indicating that those caches are much too small...
The typical reaction is: increase open_files_limit (after increasing the operating system user limits) to a higher value. Unfortunately most of the Linux distributions have a default values for open files of 1024 which is far to low for typical database systems.
Too many open filesBut customer claimed that he tried this already (set open_files_limit to 50'000) and got error messages in the MySQL error log:
# perror 24 OS error code 24: Too many open filesSo we were even more puzzled.
After some further investigation we found that the customer has up to 600 schemata and in each schema he had 30 to 100 tables and some of those tables have even monthly partitions up to 4 years (roughly 50 partitions).
A partition internally is handled similar to a table. So we have something between 18'000 and 3'000'000 tables/partitions in total. We wrote already about the problems with MySQL databases with many tables in Configuration of MySQL for Shared Hosting but now it is even worse with the partitions. I remembered that having too many partitions with MySQL is not a good idea so we investigated a bit deeper in this area:
We first looked at the amount of file handles MySQL has open:
lsof -p <pid> | wc -lThis value clearly moved towards our new open_files_limit of 150'000 (with table_open_cache and table_definition_cache set to 32k each) and then we got Too many open files errors. So we set table_open_cache and table_definition_cache both back to 2048 and go a stable number of file descriptors between 100'000 and 110'000.
This gives us an idea how to extrapolate those values when we want to have bigger caches. But we do not know how a Linux system or MySQL will behave with much higher values...
Partition table testBut why do we get such high values?
As an example we took the following table with 14 partitions:
CREATE TABLE ptn_test ( id INT UNSIGNED NOT NULL AUTO_INCREMENT , data VARCHAR(64) , ts TIMESTAMP , PRIMARY KEY (id, ts) , INDEX (ts) ) ENGINE = MyISAM PARTITION BY RANGE ( UNIX_TIMESTAMP(ts) ) ( PARTITION p_2010 VALUES LESS THAN ( UNIX_TIMESTAMP('2011-01-01 00:00:00') ) , PARTITION p_2011_01 VALUES LESS THAN ( UNIX_TIMESTAMP('2011-02-01 00:00:00') ) , PARTITION p_2011_02 VALUES LESS THAN ( UNIX_TIMESTAMP('2011-03-01 00:00:00') ) , PARTITION p_2011_03 VALUES LESS THAN ( UNIX_TIMESTAMP('2011-04-01 00:00:00') ) , PARTITION p_2011_04 VALUES LESS THAN ( UNIX_TIMESTAMP('2011-05-01 00:00:00') ) , PARTITION p_2011_05 VALUES LESS THAN ( UNIX_TIMESTAMP('2011-06-01 00:00:00') ) , PARTITION p_2011_06 VALUES LESS THAN ( UNIX_TIMESTAMP('2011-07-01 00:00:00') ) , PARTITION p_2011_07 VALUES LESS THAN ( UNIX_TIMESTAMP('2011-08-01 00:00:00') ) , PARTITION p_2011_08 VALUES LESS THAN ( UNIX_TIMESTAMP('2011-09-01 00:00:00') ) , PARTITION p_2011_09 VALUES LESS THAN ( UNIX_TIMESTAMP('2011-10-01 00:00:00') ) , PARTITION p_2011_10 VALUES LESS THAN ( UNIX_TIMESTAMP('2011-11-01 00:00:00') ) , PARTITION p_2011_11 VALUES LESS THAN ( UNIX_TIMESTAMP('2011-12-01 00:00:00') ) , PARTITION p_2011_12 VALUES LESS THAN ( UNIX_TIMESTAMP('2012-01-01 00:00:00') ) , PARTITION p_max VALUES LESS THAN (MAXVALUE) );and inserted some rows in each partition:
INSERT INTO ptn_test VALUES (NULL, 'Bla', '2010-12-01 00:00:42') , (NULL, 'Bla', '2011-01-01 00:00:42') , (NULL, 'Bla', '2011-02-01 00:00:42') , (NULL, 'Bla', '2011-03-01 00:00:42') , (NULL, 'Bla', '2011-04-01 00:00:42') , (NULL, 'Bla', '2011-05-01 00:00:42') , (NULL, 'Bla', '2011-06-01 00:00:42') , (NULL, 'Bla', '2011-07-01 00:00:42') , (NULL, 'Bla', '2011-08-01 00:00:42') , (NULL, 'Bla', '2011-09-01 00:00:42') , (NULL, 'Bla', '2011-10-01 00:00:42') , (NULL, 'Bla', '2011-11-01 00:00:42') , (NULL, 'Bla', '2011-12-01 00:00:42') , (NULL, 'Bla', '2012-01-01 00:00:42');Just for running a simple EXPLAIN:
EXPLAIN PARTITIONS SELECT * FROM ptn_test WHERE ts = '2011-11-01 00:00:42'; +----+-------------+----------+------------+--------+---------------+------+---------+------+------+-------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------+------------+--------+---------------+------+---------+------+------+-------+ | 1 | SIMPLE | ptn_test | p_2011_11 | system | ts | NULL | NULL | NULL | 1 | | +----+-------------+----------+------------+--------+---------------+------+---------+------+------+-------+MySQL already opens 28 file descriptors and uses one table_open_cache and one table_definition_cache entry. When doing the same query in a second session on a second schema MySQL already has opened 56 file descriptors and 2 table_open_cache and 2 table_definition_cache entries. What the MySQL manual states about you can find here: How MySQL Opens and Closes Tables.
SHOW GLOBAL STATUS LIKE 'open_%s'; +--------------------------+-------+ | Variable_name | Value | +--------------------------+-------+ | Open_files | 56 | | Open_table_definitions | 2 | | Open_tables | 2 | +--------------------------+-------+By the way we found that the value of Open_files is pretty close to the result of the following Linux command:
lsof -p <pid> | grep '/var/lib/mysql' | grep -e 'MYD' -e 'MYI' -e 'ib' | wc -lSo Open_files is a good indication of how far we are away of open_files_limit. MySQL needs some more file descriptors for error log file, binary log file etc. But this is typically less than 10.
Now we have already found why we got the Too many open files error message.
But it does not explain yet why our MyISAM key buffer was so badly used.
MyISAM key buffer is wiped outI have not found it mentioned in the documentation explicitly (please correct me when I am wrong) but it looks like when MySQL has a pressure on the table_open_cache and removes entries from it, it wipes out blocks from the MyISAM key buffer of those tables as well.
A little experiment for this:
- Clean up table_open_cache and table_definition_cache and MyISAM key buffer:
- Run a query from 2 sessions in 2 different schemata:
- Show the status information again:
We have 56 file descriptors (2 tables x 14 partitions x (data + index = 2)), 2 table_open_cache and table_definition_cache entries and 333 MyISAM key buffer blocks used.
- Connect from a 3rd connection without the -A option:
26 file descriptors where released. We currently have 16 entries in the table_open_cache (its size was limited to 16) but in total 19 tables were opened. Why 65 files were opened for this operation is unclear.
The current amount of MyISAM key blocks is 168 which is roughly the half of the value before. So we can assume that one of our 2 partitioned table was closed and its key buffer entries were wiped out.
- Run the query again
28 file descriptors were used again with 29 in total and one more table was opened. And our key buffer is back to 333 blocks used.
So it looks like when we have really pressure on the table_open_cache this also has an impact on the MyISAM key buffer.
How does it behave with InnoDB?Nowadays InnoDB is much more often used than MyISAM. So let us have a look how is the impact on InnoDB tables:
- Starting point values:
- Run the query in 2 different connections on 2 different tables:
- Use a new connection:
So the phenomena of wiping out data seems definitely not to happen with InnoDB tables. Further InnoDB seems to use much less file descriptors than MyISAM.
Conclusion- MyISAM uses a huge number of file descriptors. This comes especially true when using partitions.
- This needs a significant increase of open_files_limit. Its impact is unknown.
- Shortage of entries in the table_open_cache leads to wipe out tables from the table_open_cache.
- This leads to wipe out of MyISAM key buffer blocks of the according tables which leads to slower queries when the table is requested next time.
- With InnoDB this behavior is much more relaxed and problems should appear much later than with MyISAM.
- Be careful using a significant number of (partitioned) tables with MyISAM. It can have a serious impact on the performance of your system.
