You are here

Logging Galera Cluster conflicts

We typically suggest our customers to use our MySQL/Galera Cluster my.cnf configuration template to avoid MySQL configuration and performance problems.

And we are paranoid as well. Thus we enable all useful logging:

wsrep_log_conflicts = 1

But this has also some consequences of more visibility...

If you monitor carefully your Galera Cluster for example with the FromDual Performance Monitor for MySQL and MariaDB, you might probably see some strange values increasing from time to time:

mysql< SHOW GLOBAL STATUS LIKE 'wsrep_local_%r_s';
+---------------------------+-------+
| Variable_name             | Value |
+---------------------------+-------+
| wsrep_local_cert_failures | 42    |
| wsrep_local_bf_aborts     | 13    |
+---------------------------+-------+

Those values are indicators that some transactions (Galera write sets) did to not succeed and were aborted by Galera. In this case the paranoid logging helps to find, what exactly was aborted and possibly helps to find out, if this can or should be fixed:

150410  1:44:18 [Note] WSREP: cluster conflict due to certification failure for threads:
150410  1:44:18 [Note] WSREP: Victim thread:
   THD: 151856, mode: local, state: executing, conflict: cert failure, seqno: 30399304
   SQL: UPDATE login SET lTsexpire = UNIX_TIMESTAMP(NOW()) + lTimeout WHERE lSessionId = 'va3ta7besku82k56ncv3bnhlj5'

*** Priority TRANSACTION:
TRANSACTION 464359568, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
1 lock struct(s), heap size 360, 0 row lock(s)
MySQL thread id 4, OS thread handle 0x7f1c0916c700, query id 8190690 Update_rows_log_event::find_row(30399302)

*** Victim TRANSACTION:
TRANSACTION 464359562, ACTIVE 0 sec
mysql tables in use 1, locked 1
2 lock struct(s), heap size 360, 1 row lock(s), undo log entries 1
MySQL thread id 151856, OS thread handle 0x7f1c09091700, query id 8190614 172.20.100.11 sam_angiz query end
UPDATE login SET lTsexpire = UNIX_TIMESTAMP(now()) + lTimeout WHERE lSessionId = 'va3ta7besku82k56ncv3bnhlj5'
*** WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 835205 page no 3 n bits 72 index `PRIMARY` of table `fromdual`.`login` trx table locks 1 total table locks 2  trx id 464359562 lock_mode X locks rec but not gap lock hold time 0 wait time before grant 0
150410  1:44:18 [Note] WSREP: cluster conflict due to high priority abort for threads:
150410  1:44:18 [Note] WSREP: Winning thread:
   THD: 4, mode: applier, state: executing, conflict: no conflict, seqno: 30399302
   SQL: (null)
150410  1:44:18 [Note] WSREP: Victim thread:
   THD: 151856, mode: local, state: committing, conflict: no conflict, seqno: -1
   SQL: UPDATE login SET lTsexpire = UNIX_TIMESTAMP(now()) + lTimeout WHERE lSessionId = 'va3ta7besku82k56ncv3bnhlj5'

In the above Galera conflict 2 login transactions where running at the same time. They both come with the same Session ID and want to update the expiry timestamp. Now how to solve or fix this:

  • First check, if this table has a Primary Key (tables without a PK causes full table scans which can last for long time, increasing the chance for conflicts).
  • Second check, if there is a (UNIQUE?) index on lSessionId. A missing index leads to full table scans which increases the chance for conflicts.
  • Third check WHY 2 logins from the same Session ID can arrive at the same time (within 1 second) on 2 different Galera nodes (Ajax requests, etc...). Try to avoid such situations.

Comments

We finally got the results. The table does NOT have a Primary Key and/or an index on lSessionId:

CREATE TABLE IF NOT EXISTS `genLogins` (
  `lSessionId` varchar(128) NOT NULL,
  `lUid` int(8) NOT NULL DEFAULT '0',
  `lLang` char(2) NOT NULL,
  `lTimeout` int(10) NOT NULL DEFAULT '0',
  `lReloadsess` enum('y','n') NOT NULL DEFAULT 'n',
  `lTsexpire` int(10) NOT NULL DEFAULT '0'
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

Please also read: Disadvantages of explicitly NOT using InnoDB Primary Keys and Limitations of MySQL.

The alter_engine.php script of our MySQL BasEnv (MyEnv) can help you finding such tables...

Shinguzcomment