You are here

Which table is hit by an InnoDB page corruption?

InnoDB is known to have crash-recovery capabilities and thus is called a crash safe storage engine (in contrary to MyISAM). Nevertheless under certain circumstances it seems like InnoDB pages can get corrupt during a crash and then a manual crash-recovery is needed.

Oracle/MySQL blames in such cases the Operating System, the I/O system or the hardware. What we have seen is that such incidents occur more often on Windows systems and when people are running their databases in a virtualized environment (VMWare). Because of the small number of cases we are aware of, this may not be representative.

One of our customers did systematic crash tests with InnoDB in a virtual machine and got the problem below (MySQL error log):

InnoDB: Log scan progressed past the checkpoint lsn 0 548857890
InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percents: 0 1 2 3 4 5 6 7 8 9 10 11 12 13
InnoDB: Database page corruption on disk or a failed
InnoDB: file read of page 23361.
InnoDB: Page dump in ascii and hex (16384 bytes):
        ...
InnoDB: Page checksum 3306814791, prior-to-4.0.14-form checksum 2663819873
InnoDB: stored checksum 1444540605, prior-to-4.0.14-form stored checksum 0
InnoDB: Page lsn 0 550886572, low 4 bytes of lsn at page end 0
InnoDB: Page number (if stored to page already) 23361,
InnoDB: space id (if created with >= MySQL-4.1.1 and stored already) 17
InnoDB: Page may be an index page where index id is 0 61
InnoDB: Database page corruption on disk or a failed
InnoDB: file read of page 23361.
InnoDB: Ending processing because of a corrupt database page.

What you have to do in such a case is describe in the MySQL documentation: Forcing InnoDB Recovery [ 1 ]. So we started the database with this parameter (begin with smallest parameter value first):

# my.cnf

[mysqld]
innodb_force_recovery = 1

And it looks like we were very lucky. The database came up. The MySQL error log then looked as follows:

InnoDB: Log scan progressed past the checkpoint lsn 0 548857890
InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percents: 0 1 2 3 4 5 6 7 8 9 10 11 12 13
InnoDB: Database page corruption on disk or a failed
InnoDB: file read of page 23361.
InnoDB: You may have to recover from a backup.
InnoDB: Page dump in ascii and hex (16384 bytes):
        ...
InnoDB: Page checksum 3306814791, prior-to-4.0.14-form checksum 2663819873
InnoDB: stored checksum 1444540605, prior-to-4.0.14-form stored checksum 0
InnoDB: Page lsn 0 550886572, low 4 bytes of lsn at page end 0
InnoDB: Page number (if stored to page already) 23361,
InnoDB: space id (if created with >= MySQL-4.1.1 and stored already) 17
InnoDB: Page may be an index page where index id is 0 61
InnoDB: Database page corruption on disk or a failed
InnoDB: file read of page 23361.
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88
89 90 91 92 93 94 95 96 97 98 99
InnoDB: Apply batch completed
InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percents: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92
93 94 95 96 97 98 99
InnoDB: Apply batch completed
InnoDB: Doing recovery: scanned up to log sequence number 0 554100736
InnoDB: Doing recovery: scanned up to log sequence number 0 555254799
InnoDB: 1 transaction(s) which must be rolled back or cleaned up
InnoDB: in total 1 row operations to undo
InnoDB: Trx id counter is 0 322048
InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percents: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92
93 94 95 96 97 98 99
InnoDB: Apply batch completed
InnoDB: Starting in background the rollback of uncommitted transactions
InnoDB: Rolling back trx with id 0 321635, 1 rows to undo
InnoDB: Started; log sequence number 0 555254799
InnoDB: !!! innodb_force_recovery is set to 1 !!!
InnoDB: Rolling back of trx id 0 321635 completed
InnoDB: Rollback of non-prepared transactions completed

Even when we set innodb_force_recovery again to 0 the problem has disappeared. So it looks like InnoDB was capable to fix the problem by itself.

But now we still want to know which table was affected by this corruption. To find this out 2 possibilities came to my mind:

The first one is to run the following program on all *.ibd files (innodb_files_per_table = ON). For this we need the original corrupted files again:

for i in *.ibd ; do
  echo $i
  innochecksum $i
done

data.ibd
page 23361 invalid (fails log sequence number check)
test1.ibd
test2.ibd
test3.ibd

This matches perfectly to the message in the MySQL error log:

InnoDB: file read of page 23361.

Further the output of the following command matches as well to the MySQL error log message:

innochecksum -v -d -p 23361 data.ibd
file data.ibd = 398458880 bytes (24320 pages)...
checking pages in range 23361 to 23361
page 23361: log sequence number: first = 550886572; second = 0
page 23361 invalid (fails log sequence number check)

InnoDB: Page lsn 0 550886572, low 4 bytes of lsn at page end 0

So we found that the table is data.ibd.

An other possibility to find the table is by using the InnoDB table monitor [ 2 ]. From the MySQL error log file we know:

InnoDB: Page may be an index page where index id is 0 61

When we enable the InnoDB table monitor we can find in the MySQL error log the following information:

TABLE: name test/data, id 0 29, columns 8, indexes 3, appr.rows 309437
  COLUMNS: sid: DATA_INT DATA_BINARY_TYPE DATA_NOT_NULL len 8; cid: DATA_INT
           DATA_BINARY_TYPE DATA_NOT_NULL len 8; data_payload: DATA_BLOB
           DATA_BINARY_TYPE len 10; timestamp: DATA_INT DATA_UNSIGNED
           DATA_BINARY_TYPE DATA_NOT_NULL len 4; lengthAccumulated: DATA_INT
           DATA_UNSIGNED DATA_BINARY_TYPE DATA_NOT_NULL len 8; DB_ROW_ID:
           DATA_SYS prtype 256 len 6; DB_TRX_ID: DATA_SYS prtype 257 len 6;
           DB_ROLL_PTR: DATA_SYS prtype 258 len 7;
  INDEX: name PRIMARY, id 0 61, fields 3/7, uniq 3, type 3
   root page 3, appr.key vals 309437, leaf pages 22102, size pages 22208
   FIELDS:  timestamp sid cid DB_TRX_ID DB_ROLL_PTR data_payload lengthAccumulated
  INDEX: name timestampLengthAccumulated_idx, id 0 62, fields 2/4, uniq 4, type 0
   root page 4, appr.key vals 309284, leaf pages 639, size pages 675
   FIELDS:  timestamp lengthAccumulated sid cid
  INDEX: name sidCid_idx, id 0 63, fields 2/3, uniq 3, type 0
   root page 5, appr.key vals 309739, leaf pages 487, size pages 545
   FIELDS:  sid cid timestamp

So it looks like the InnoDB PRIMARY Index was affected (which basically means the data itself!)

It looks like the InnoDB table in this situation is again OK. But if you do not trust it. You should dump the data, drop the table and afterwards reimport the data.

We got some additional information from our customer that this corruption does not happen any more when he disables the InnoDB double write buffer (innodb_doublewrite).

Literature

  1. [ 1 ] Forcing InnoDB Recovery
  2. [ 2 ] InnoDB Table Monitor Output

Comments

Because I have seen this week again a similar situation I thought about what could cause such InnoDB block corruptions. I came to the following evil-doers:
  • Editing the InnoDB tablespace file with a normal text editor.
  • Virus checker
  • fsck / chkdsk
  • Active/active DRBD/SAN (2 mysqld processes accessing the same InnoDB tablespace files at the same time)
  • Human interaction (on purpose?)
  • Broken I/O controller or other issues with the I/O system
If you have some more ideas. I would be happy to learn about!
olicomment