You are here

The MySQL CSV Storage Engine

A little summary about the MySQL CSV table engine:

  • The SQL-Demo script (930 byte) for the following article.
  • The CSV converter to convert normal CSV files into a CSV format which is for MySQL acceptable.

Caution: Use on your own risk!

MySQL CSV tables (internally also called TINA tables) are driven by the MySQL CSV storage engine. This feature was added in MySQL release 4.1.4. CSV tables store data in text files using the Character-Separated-Value format.

mysql> SELECT version();
+------------+
| version()  |
+------------+
| 5.0.16-max |
+------------+

CSV tables are an equivalent to Oracle external tables. They can be use to import data from your favourite spread sheet software (e.g. OpenOffice Calc), exchange data from or with other data sources or just migrating data from an other database system to MySQL.

To enable CSV tables, use the --with-csv-storage-engine option during configure when you build MySQL. If you got a already compiled MySQL binary you can find out if CSV tables are supported like this:

mysql> show engines;
+--------+---------+-----------------------------------+
| Engine | Support | Comment                           |
+--------+---------+-----------------------------------+
...
| CSV | YES | CSV storage engine                       |
...

When you create a CSV table, the server creates a table definition file in the database directory. The file begins with the table name and has an .frm extension. The storage engine also creates a data file. Its name begins with the table name and has a .CSV extension. The data file is a plain text file. When you store data into the table, the storage engine saves it into the data file in CSV format.

CSV tables do actually (5.0.12) neither support indexing nor transactions.

Example:

CREATE TABLE csv_test (id INT, value CHAR(20)) ENGINE = CSV;

INSERT INTO csv_test VALUES (1, 'Record 1'), (2, 'Record 2'), (3, 'Record 3');

SELECT * FROM csv_test;

Indexing

CSV tables do NOT support indexing. This is because the files can be traded in and out of the table directory without having to worry about rebuilding anything.

Creating an index gives the following result:

mysql> CREATE INDEX value_i ON csv_test (value);

ERROR 1069 (42000): Too many keys specified; max 0 keys allowed

Transactions

CSV tables do NOT support transactions.

mysql> BEGIN;
mysql> SELECT * FROM csv_test;
mysql> INSERT INTO csv_test VALUES (4, 'Record 4');
mysql> SELECT * FROM csv_test;
mysql> ROLLBACK;
Query OK, 0 rows affected, 1 warning (0.00 sec)

Warning (Code 1196): Some non-transactional changed tables couldn't be rolled back

mysql> SELECT * FROM csv_test;

Data exchange

As the table type already declares CSV tables are tables in the CSV format. This format is usually used to exchange data between different kind of applications e.g. spread sheets etc.

The following MySQL table

mysql> SELECT * FROM csv_test;
+------+------------+
| id   | value      |
+------+------------+
|    1 | record 1   |
|    2 | record 2   |
|    3 | record 3   |
+------+------------+
3 rows in set (0.00 sec)

looks in UNIX like this:

$ cat csv_test.CSV
"1","record 1"
"2","record 2"
"3","record 3"

So the data can be read by any other applications which are understanding the CSV format (e.g. OpenOffice Calc). To allow the application to access the data either the application has to be started under the MySQL owner (e.g. mysql) or the permissions of the file have to be changed correctly (e.g. chmod o+r csv_test.CSV).

Even writing data is possible under some circumstances. But unfortunately OpenOffice Calc writes numbers in the wrong format (missing double quotes):

$ cat oo_test.csv
1,"record 1"
2,"record 2"
3,"OO Calc"

If you palm MySQL such a file the system will dump with little amount of data or it will start wasting memory until the whole server crashes!!! So do such things NEVER on a productive system!

mysql> SELECT COUNT(*) FROM csv_test;
ERROR 2013 (HY000): Lost connection to MySQL server during query

mysql>
Number of processes running now: 0
051203 21:58:54 mysqld restarted

In this case neither REPAIR TABLE nor CHECK TABLE helps. Do either a TRUNCATE TABLE or fix it somehow else (e.g. vi).

After fixing the problem with an editor (e.g. vi) you can easily palm the CSV file to MySQL. The database does not immediately recognise the data but after flushing the table the data will be reread:

mysql> FLUSH TABLE csv_test;
mysql> SELECT * FROM csv_test;
+------+------------+
| id   | value      |
+------+------------+
|    1 | record 1   |
|    2 | record 2   |
|    3 | OO Calc    |
+------+------------+
3 rows in set (0.00 sec)

According to the documentation actually only UNIX OS files are supported. But you should easily be able to convert the files to the DOS format (dos2unix) either using one of your UNIX systems or installing CygWin (with dos2unix) on your Windows box.

csv_converter.pl

To avoid problems with your external application you should better define the character set of your table according to the import/export filters of you application. E.g.

CREATE TABLE csv_test (id INT, value CHAR(20)) ENGINE = CSV CHARACTER SET utf8;

Be careful using correct formats for the data which you are manipulating with your external application. Wrong data or formats can crash the whole database system (and also the server) or can corrupt the data.

mysql> UPDATE csv_test SET c='Shorter' WHERE c ='record 3 is much too long.';

Query OK, 0 rows affected, 1 warning (0.01 sec)
Rows matched: 0 Changed: 0 Warnings: 1

Warning (Code 1265): Data truncated for column 'c' at row 3

Data types

The most common data types seems to be supported and work:

CREATE TABLE csv_test (id INT, fix_string CHAR(20)
, var_string VARCHAR(180), v_date DATE, v_time TIME
, v_datetime DATETIME, v_timestamp TIMESTAMP, v_float FLOAT
, v_double DOUBLE)
ENGINE = CSV
CHARACTER SET utf8;

INSERT INTO csv_test (id, fix_string, var_string, v_date, v_time, v_datetime, v_timestamp, v_float, v_double)
VALUES (1, 'Record 1', 'Blablabla...'
, STR_TO_DATE('2005-05-13', '%Y-%m-%d')
, STR_TO_DATE('11:34:55', '%H:%i:%s')
, STR_TO_DATE('2005-07-30 15:17:33', '%Y-%m-%d %H:%i:%s'), NOW()
, 123.456, 1234567890.0123456789);

Remarks

After adding or deleting some records with MySQL the file looks like this:

$ cat csv_test.CSV
"1","record 1"
^@^@^@^@"2","record 2"
"3","OO Calc"

Bugs

The bugs described below disappeared between release 5.0.12 and 5.0.16. Actually there are no more known bugs with CSV tables!

The following sequence produces data corruptions which is NOT reproducible with e.g. InnoDB tables:

show warnings;
DROP TABLE csv_test3;
CREATE TABLE csv_test3 (id INT, v_date DATE, v_time TIME) ENGINE = CSV;
INSERT INTO csv_test3 (id, v_date, v_time)
VALUES (1, STR_TO_DATE('2005-05-13', '%Y-%m-%d'), STR_TO_DATE('11:34:55', '%H:%i:%s'));
SELECT * FROM csv_test3 WHERE id = 1;
UPDATE csv_test3 SET v_date = NULL, v_time = NULL WHERE id = 1;
SELECT * FROM csv_test3 WHERE id = 1;
UPDATE csv_test3 SET v_date = NULL, v_time = NULL WHERE id = 1;
SELECT * FROM csv_test3 WHERE id = 1;
SELECT * FROM csv_test3;

Also this sequences corrupts data:

show warnings;

CREATE TABLE csv_test ( c1 integer ) ENGINE = CSV;
INSERT INTO csv_test VALUES ( 1 );
INSERT INTO csv_test VALUES ( 2 );
INSERT INTO csv_test VALUES ( 3 );
SELECT * FROM csv_test;

DELETE FROM csv_test WHERE c1 = 2;
SELECT * FROM csv_test;

INSERT INTO csv_test VALUES ( 4 );
SELECT * FROM csv_test;

INSERT INTO csv_test VALUES ( 5 );
SELECT * FROM csv_test;

Conversion and Performance

For testing purposes a table was created: Size: 22.5 MB, 1'572'864 rows. A COUNT(*) on this table took 9.5 sec.
The conversion to InnoDB was done like this:

CREATE TABLE inno_test ENGINE = INNODB
AS SELECT * FROM csv_test;

and took about 282 sec. A COUNT(*) on this converted table took 22.6 sec.

Converting the table like this:

ALTER TABLE csv_test ENGINE = INNODB;

RENAME TABLE csv_test TO inno_test2;

took 206 sec.

A verification of the table types shows, that everything went well:

mysql> SELECT table_name, engine table_rows
FROM information_schema.tables
WHERE table_name LIKE '%test%';
+------------+------------+
| table_name | table_rows |
+------------+------------+
| inno_test  | InnoDB     |
| inno_test2 | InnoDB     |
+------------+------------+

Literature

[1] MySQL 5.0 Reference Manual: The CSV Storage Engine. [2] Sources of CSV table engine: sql/examples/ha_tina.* [3] MySQL Forums :: CSV Storage Engine. [4] Wikipedia: Comma SeparatedValues.