MySQL recap and future storage engines

MySQL logo

I'm likely the last to notice, but there are a lot of exciting things going on in the MySQL space lately. I've been using MySQL since a few years now, and pretty much everyone agrees it is the de-facto database for web development. I've had Planet PHP for a while now in my feed reader, but recently I also started reading Planet MySQL, only to discover a wealth of useful information. This article is for the people who are still stuck in the past (MySQL 4.x), and need reasons to upgrade to simply want to know whats in the pipeline.
In case you are already reading Planet MySQL.. you might as well stop reading now.

MySQL has always been fairly limited in features, compared to for example Oracle, PostgreSQL or even Microsoft SQL Server, but they have been catching up.. MySQL 4.1 already brought us Prepared statements and Subqueries.

MySQL 5.0 added features people have been waiting for for eons, the most important being:

  1. Views which allow you to use a complex query as if it was a separate table. Depending on the type of query you are even able to update records from that table.. which will make mysql propagate the data back to the underlying 'real' tables.
  2. Stored procedure/function support. These allow you to store common operations you need to do on your data as a function.
  3. Triggers, which allow you to run additional queries when a specific event occurs.. (such as an update to a table). If your application is growing you'll notice that 'academic-style' normalization isn't always applicable in the real world, and you will have duplicate data in your database.
    A simple example of this is a blog, which shows the number of comments per article. When its small-scale its fine to always count the number of comments based on an article id, but when your blog goes through the roof, it might be better to store the 'comment-count' separate on a per-post basis. Triggers can be really useful to make sure this number is always correct and up-to-date. There are more use-cases, but this 90% of the times this is where I use triggers.
  4. Cluster storage engine.

MySQL 5.1, which is still under development will add features such as

  1. Partitioning, according to the manual "This capability enables distributing portions of individual tables across a filesystem, according to rules which can be set when the table is created. In effect, different portions of a table are stored as separate tables in different locations, but from the user point of view, the partitioned table is still a single table."
  2. An event scheduler, which essentially adds crontab-like capabilities to MySQL
  3. Several XML functions, allowing you to perform x-path expressions on XML data in your database.
  4. A Plugin API, which I'll cover later.

Storage engines

A storage engine is the underlying engine for a table in mysql. A simple example is that there are storage engines that store your data on a harddisk , and others that store your data in memory. You can always join multiple tables coming from different storage engines.. so there's no problem with mixing them up and picking different storage engines for different tables.

These are the most common storage engines.

  • MyISAM is probably used the most, as this is the default when you're creating a new table. It has all the basic features, and its ideal for lots of reads and relatively little writes.
  • InnoDB has a little broader featureset, such as foreign keys, row-based locking and transactions. Row based locking means that the entire table doesn't have to be locked for writes (effectively delaying every read when somebody is writing).
  • The MERGE storage engine allows you to combine multiple tables as if it was one.
  • The Memory storage engine keeps all data in memory, which is very fast, but also means you will lose the data when the server stops.
  • The Federated engine, allows you to connect to remote MySQL databases and access tables remotely as if it was local.
  • The Archive storage engine is a good pick for storing old data that isn't being used a lot. Its optimized for disk-usage and generally slow.
  • Last, but not least.. the NDB cluster engine is an option if you have huge amounts of data, concurrency and need really high availability.

Oracle and InnoBase

InnoBase has always been the company behind the powerful InnoDB storage engine. In October 2005 Acquired Innobase. Even though InnoDB is open source, and thus owned by the public, the future of the product itself is still for a big part in the hands one of MySQL biggest commercial competitors.

One of MySQL's responses was (unverified, no source for this) to quickly develop a developer-friendly Plugin API, and start working on a new storage engine, which should replace InnoDB in MySQL, named Falcon. At the same time more commercial and open source storage engines are popping up everywhere.. Some of the engines I came across reading Planet MySQL..

New (and sometimes crazy) storage engines

  • A Filesystem storage engine. Allowing you to access files from your system using SQL queries. Very cool for reading information out of the /proc filesystem, but there are loads of other uses. (very alpha) (Examples: [1], [2]).
  • IBM DB2 storage engine, but this is just for i5/OS mainframe.
  • There also has been a pretty cool idea to use RSS feeds from the web as a storage engine.. No code yet, but this could be a lot of fun to use..
  • SolidDB, which is a transaction based storage engine.. Has been around for a while now and could be a good alternative to InnoDB.
  • Amazon S3 storage engine; allowing you to use Amazon S3 as a mysql table.
  • PrimeBase XT, another transactional storage engine.
  • ScaleDB, which is not public yet.. Not sure exactly what the advantages and disadvantages are.. but the name should tell you something.
  • NitroEDB, a commercial storage engine.. they seem to mainly advertise with being very fast with high loads.. but no facts or comparisons.
  • ODBC engine, allowing you to connect to remote odbc-compatible databases and use them locally (no code (yet)).
  • I'm really excited about this Memcached storage engine. MemCached is a distributed cache, useful for applications that need to scale horizontally.. I'm not sure yet if this engine will allow me to do this, but hopefully I can use this engine to invalidate cache entries when data is updated using triggers. Not sure yet if it can do that, but its definitely worth a try.. I was hoping something like this would be available for MySQL ever since I read its possible with this PostgresSQL plugin.

A sales-person would probably summarize all this as "MySQL going Web 2.0".. In any case there's lots of exciting things happening.

Links..

MySQL community logo

If you weren't tired of clicking yet, here's some more interesting stuff I recently found.

  1. MySQL forge is a recently conceived project which should contain links and descriptions to mysql-related projects, code snippets, and lots of other tools. Seems to be an attempt to fuel the mysql-user community.
  2. As mentioned before, Planet MySQL is a very active blog aggregator. Lots of MySQL employees seem to be on here as well.

Sources used for this article. (I hope I didn't miss any..)

http://mysql-python.blogspot.com/2007/04/mysql-conference-2007.html
http://www.bytebot.net/blog/archives/2007/04/25/a-storage-engine-for-amazon-s3
http://www.dbms2.com/2007/04/26/mysql-ibm-iseries/.

Happy reading..