My thoughts on MySQL (updated)

MySQL is well-known as an easy-to-use, easy-to-set-up, and easy-to-run SQL database. It has a long history, having had its 20th birthday a few months ago, and has enjoyed widespread popularity as the go-to option for SQL databases, also being included in the XAMPP/LAMP stacks.

However, MySQL has its drawbacks, some of which I will cover here. This post is written from the perspective of someone working with MySQL 5.6, although I will acknowledge the changes made in 5.7.

NOTE: I decided this post was too snarky and didn't contain enough detail, so I've updated it as of December 2015.

MySQL replication is frustrating

MySQL replication is fragile. It's based around journaling - the binary log, in particular - and while this fits in with the architecture of the system in general, there are a few fairly large drawbacks:

For data to to be replicated from server A to server B, the actions of creating the data (INSERT statements or similar) must exist in server A's binary log. If you start slaving from a server that already has data, you'll have to get that data onto the slave first, outside replication - mysqldump, xtrabackup or percona's pt-table-sync tool are ways to do this. Note that if your servers are out of sync and you start replication, MySQL will not warn you in any way.

Some statements are non-deterministic (such as "INSERT ... SELECT" without an ORDER BY clause) and can result in different results on a slave and master. MySQL offers a Binary Log Format setting which can be used to make the binary log reflect the rows that were updated by each statement, rather than the statements themselves. There is also a "mixed" mode that will switch between row-based and statement-based formats in order to save space while also remaining deterministic. Unfortunately, the default setting before MySQL 5.7.7 is statement-based logging.

MySQL defaults are frustrating (but getting better)

MySQL defaults aren't great. By default, you can insert data into fields that you shouldn't be able to, and MySQL will do something to your data to make it fit, and then return a warning (which most people ignore). Examples of this include:

  • Strings too long for a text field constraint (e.g. VARCHAR(4)) are truncated.
  • Values out of range for a given field type/constraint are adjusted or given default values.

for example:

    mysql> create table blah ( id tinyint unsigned primary key, adate date, str varchar(4) );
    Query OK, 0 rows affected (0.27 sec)

    -- valid row
    mysql> insert into blah(id, adate, str) values(1, '2015-12-15', 'test');
    Query OK, 1 row affected (0.00 sec)

    -- varchar too long
    mysql> insert into blah(id, adate, str) values(2, '2015-12-15', 'hello');
    Query OK, 1 row affected, 1 warning (0.00 sec)

    -- date out of range (max date is 9999-12-31)
    mysql> insert into blah(id, adate, str) values(3, '10000-12-15', 'test');
    Query OK, 1 row affected, 1 warning (0.00 sec)

    -- tinyint out of range (max value is 255)
    mysql> insert into blah(id, adate, str) values(1024, '2015-12-15', 'test');
    Query OK, 1 row affected, 1 warning (0.00 sec)

    mysql> select * from blah;
    +-----+------------+------+
    | id  | adate      | str  |
    +-----+------------+------+
    |   1 | 2015-12-15 | test |
    |   2 | 2015-12-15 | hell |
    |   3 | 0000-00-00 | test |
    | 255 | 2015-12-15 | test |
    +-----+------------+------+
    4 rows in set (0.00 sec)

Note the warnings, which are easy to see in the commandline client, but easy to ignore if you're not checking for them in your code.

As I understand, MySQL default behaviour is overly permissive like this for backwards compatibility reasons, which is understandable. There is a very important variable named sql_mode that can be used to adjust MySQL's behaviour in these situations - to reject inserts instead of manipulating the data, by enabling "STRICT mode" This is disabled by default, but the MySQL developers are making it enabled by default in 5.7, which was released as General Availability about a month ago. I recommend any new MySQL DBA/developer to look into the effects of different sql_mode settings and plan accordingly.

Another annoying default, up until MySQL 5.6, was the default table engine being MyISAM. MyISAM is a simple table engine, with fast serial performance - however, it has many drawbacks that make it ill-suited for use as a database engine:

  • No support for transactions, which are a fairly important feature of SQL, and required for ACID compliance.
  • No support for foreign key constraints.
  • Poor crash recovery, due to lack of journaling.
  • No row-level locking - table locks only, which severely restricts concurrent performance.

The alternative to MyISAM is InnoDB, which supports all of the above features that MyISAM lacks. InnoDB also has more complete caching support - the InnoDB Buffer Pool supports caching of both data and indexes, whereas MyISAM key buffers are just for indexes. Up until MySQL 5.6, MyISAM had one advantage over InnoDB - full text indexes - but these are now available for InnoDB tables. Some maintain that MyISAM's simplicity provides a performance advantage over InnoDB, but in my opinion the drawbacks are far too severe to consider using MyISAM for anything other than throwaway data.

The MariaDB project has begun development of a new storage engine, Aria, based on MyISAM, with the goal of fixing all of MyISAM's shortcomings and becoming the default storage engine for MariaDB.

Fragmentation

Another irritation I have with MySQL is the fragmentation that has happened in the last 5-10 years. There are many forks, branches and offshoots of MySQL, such as Percona Server, MariaDB, Galera Cluster, and WebScaleSQL, each with their own extra features, with no clear winner in terms of coverage.

The best candidates to consider would probably be either [Percona Server][perconsaserver] or MariaDB. Percona also provide utilities for administrating MySQL, which can be used with other forks.

Undesirable Behaviour

A MySQL transaction can include a statement like this:

    INSERT INTO db1.table1 (id, name) SELECT id, name FROM db2.table1;

This outcome of this statement depends on the state of both db1.table1 and db2.table1. This gets handled fine during execution, but makes replaying a binary log on just db1 impossible, as even with binlog format set to MIXED, the operation is stored as a statement. I have seen some reports that this should not be possible in MySQL, but it works in MariaDB 10.0.22, using InnoDB tables.

An easy fix for this would be to convert the statement stored in the binary log to the ROW format, as MySQL already does when performing statements such as "CREATE TABLE ... SELECT".

Bugs

Bugs such as this one are annoying, easily triggered, and hard to fix. This one in particular ("LOST_EVENTS" appearing in your binary log after a mistyped GRANT statement) appears to still affect the latest version of Oracle MySQL (5.6.24, as of writing).

Conclusion

After revisiting this issue, I have come to the conclusion that although MySQL has its issues, it is still worth using, as long as you plan ahead and take care not to fall into any traps. MySQL can be used as a "plug and play" SQL server, but for reliable, scalable operation, you need to take care to set it up correctly.

Comments