April 28, 2016

Writing SQL that works on PostgreSQL, MySQL and SQLite

I am one of those crazy people who attempts to write SQL that works on SQlite, MySQL and PostgreSQL. First I should explain why:

This is all for my project sabre/dav. sabre/dav is a server for CalDAV, CardDAV and WebDAV. One of the big design goals is that it this project has to be a library first, and should be easily integratable into existing applications.

To do this effectively, it’s important that it’s largely agnostic to the host platform, and one of the best ways (in my opinion) to achieve that is to have as little dependencies as possible. Adding dependencies such as Doctrine is a great idea for applications or more opinionated frameworks, but for sabre/dav lightweight is key, and I need people to be able to understand the extension points fairly easily, without requiring them to familiarize them with the details of the dependency graph.

So while you are completely free to choose to add Doctrine or Propel yourself, the core adapters (wich function both as a default implementation and as samples for writing your own), all only depend on an instance of PDO.

The nice thing is that ORMs such as Doctrine and Propel, you can get access to the underlying PDO connection object and pass that, thus reusing your existing configuration.

For the longest time we only supported SQLite and MySQL, but I’m now working on adding PostgreSQL support. So I figured, I might as well write down my notes.

But how feasable is it to write SQL that works everywhere?

Well, it turns out that this is actually not super easy. There is such as thing as Standard SQL, but all of these databases have many of their own extensions and deviations.

The most important thing is that this will likely only work well for you if you have a very simple schema and simple queries.

Well, this blog post is not intended as a full guide, I’m just listing the particular things I’ve ran into. If you have your own, you can edit this blog post on github, or leave a comment.

My approach

I try to keep my queries as simple as possible.
If I can rewrite a query to work on every database, that query will have the preference.
I avoid stored procedures, triggers, functions, views. I’m really just dealing with tables and indexes.
Even if that means that it’s not the most optimal query. So I’m ok with sacrificing some performance, if that means my queries can stay generic, within reason.
If there’s no possible way to do things in a generic way, I fall back on something like this:

<?php

if ($pdo->getAttribute(PDO::ATTR_DRIVER_NAME) === 'pgsql') {

    $query = "...";

} else {

    $query = "...';

}

$stmt = $pdo->prepare($query);

?>

DDL

First there is the “Data Definition Language” and “Data Manipulation Language” the former is used for queries starting with CREATE, ALTER, DROP, etc, and the latter SELECT, UPDATE, DELETE, INSERT.

There really is no sane way to generalize your CREATE TABLE queries, as the types and syntax are vastly different.

So for those we have a set of .sql files for every server.

Quoting

In MySQL and SQlite you can use either quotes ' or double quotes " to wrap a string.

In PostgreSQL, you always have to use single quotes '.

In MySQL and SQLite you use backticks for identifiers. PostgreSQL uses single quotes. SQlite can also use single quotes here if the result is unambigious, but I would strongly suggest to avoid that.

This means that this MySQL query:

SELECT * FROM `foo` WHERE `a` = "b"

is equivalent to this PostgreSQL query:

SELECT * FROM "foo" WHERE "a" = 'b'

Luckily you can often just write this query, which works for all databases:

SELECT * FROM foo WHERE a = 'b'

But keep in mind that when you create your tables, using double quotes will cause PostgreSQL to retain the upper/lower case characters. If you do not use quotes, it will normalize everything to lower case.

For compatibility I would therefore suggest to make sure that all your table and column names are in lower case.

REPLACE INTO

The REPLACE INTO is a useful extension that is supported by both SQLite and MySQL. The syntax is identical INSERT INTO, except that if it runs into a key conflict, it will overwrite the existing record instead of inserting a new one.

So REPLACE INTO basically either updates or inserts a new record.

This works on both SQLite and MySQL, but not PostgreSQL. Since version 9.5 PostgreSQL gained a new feature that allows you to achieve the same effect.

This statement from MySQL or SQLite:

REPLACE INTO blog (uuid, title) VALUES (:uuid, :title)

then might become something like this in PostgreSQL:

INSERT INTO blog (uuid, title) VALUES (:uuid, :title)
ON CONFLICT (uuid) DO UPDATE SET title = :title

So the major difference here is with PostgreSQL we specifically have to tell it which key conflict we’re handling (uuid) and what to do in that case (UPDATE).

In addition to REPLACE INTO, MySQL also has this syntax to do the same thing:

INSERT INTO blog (uuid, title) VALUES (:uuid, :title)
ON DUPLICATE KEY UPDATE title = :title

But as far as I know SQLite does not have a direct equivalent.

BLOB

SQLite and MySQL have a BLOB type. This type is used for storing data as-is. Whatever (binary) string you store, you will retrieve again and no conversion is attempted for different character sets.

PostgreSQL has two types that have a similar purpose: Large Objects and the bytea type.

The best way to describe large objects, is that they are stored ‘separate’ from the table, and instead of inserting the object itself, you store a reference to the object (in the form of an id).

bytea is more similar to BLOB, so I opted to use that. But there are some differences.

First, if you do a select such as this:

<?php

$stmt = $pdo->prepare('SELECT myblob FROM binaries WHERE id = :id');
$stmt->execute(['id' => $id]);

echo $stmt->fetchColumn();

?>

On MySQL and Sqlite this will just work. The myblob field is represented as a string.

On PostgreSQL, byta is represented as a PHP stream. So you might have to rewrite that last statement as:

<?php

echo stream_get_contents($stmt->fetchColumn());

?>

Or:

<?php

stream_copy_to_stream($stmt->fetchColumn(), STDOUT);

?>

Luckily in sabre/dav we pretty much support streams where we also support strings, so we were already agnositic to this, but some unittests had to be adjusted.

Inserting bytea is also a bit different. I’m not a fan of of using PDOStatement::bindValue and PDOStatement::bindParam, instead I prefer to just send all my bound parameters at once using execute:

<?php

$stmt = $pdo->prepare('INSERT INTO binaries (myblob) (:myblob)');
$stmt->execute([
    'myblob' => $blob
]);

?>

While that works for PostgreSQL for some strings, it will throw errors when you give it data that’s invalid in the current character set. It’s also dangerous, as PostgreSQL might try to transcode the data into a different character set.

If you truly need to store binary data (like I do) you must do this:

<?php

$stmt = $pdo->prepare('INSERT INTO binaries (myblob) (:myblob)');
$stmt->bindParam('myblob', $blob, PDO::PARAM_LOB);
$stmt->execute();

?>

Luckily this also just works in SQlite and MySQL.

String concatenation

Standard SQL has a string concatenation operator. It works like this:

SELECT 'foo' || 'bar'
// Output: foobar

This works in PostgreSQL and Sqlite. MySQL has a function for this:

SELECT CONCAT('foo', 'bar')

PostgreSQL also has this function, but SQLite does not. You can enable Standard SQL concatenation in MySQL by enabling it:

SET SESSION sql_mode = 'PIPES_AS_CONCAT'

I’m not sure why this isn’t the default.

Last insert ID

The PDO object has a lastInsertId() function. For SQLite and MySQL you can just call it as such:

<?php

$id = $pdo->lastInsertId();

?>

However, PostgreSQL requires an explicit sequence identifier. By default this follows the format tablename_idfield_seq, so we might specifiy as this:

<?php

$id = $pdo->lastInsertId('articles_id_seq');

?>

Luckily the parameter gets ignored by SQLite and MySQL, so we can just specify it all the time.

Type casting

If you have an INT field (or similar) and you access it in this way:

<?php

$result = $pdo->query('SELECT id FROM articles');
$id = $result->fetchColumn();

?>

With PostgreSQL $id will actually have the type php type integer. If you use MySQL or SQlite, everything gets cast to a php string, which is unfortunate.

The sane thing to do is to cast everything to int after the fact, so you can correctly do PHP 7 strict typing with these in the future.

Testing

I unittest my database code. Yep, you read that right! I’m one of those people. It’s been tremendously useful.

Since adding PostgreSQL I was able to come up with a nice structure. Every unittest that does something with PDO now generally looks like this:

<?php

abstract PDOTest extends \PHPUnit_Framework_TestCase {

    abstract function getPDO();

    /** all the unittests go here **/

}

?>

Then I create one subclass for PostgreSQL, Sqlite and MySQL that each only implement the getPDO() function.

This way all my tests are repeated for each driver.

I’ve also rigged up Travis CI to have a MySQL and a PostgreSQL database server running, so everything automatically gets checked every time.

If a developer is testing locally, we detect if a database server is running, and automatically just skip the tests if this was not the case. In most cases this means only the Sqlite tests get hit, which is fine.

Conclusions

Created a monster.
PostgreSQL is by far the sanest database, and I would recommend everyone to move from MySQL towards it.

Comments

asoki • Apr 28, 2016

hi Evert, thanks for the interesting article.
just one comment on the lastInsertId. At least on postgresql, lastInsertId will return a wrong id when the insert command creates more than one rows in the table. For example when some after trigger inserts an additional row you will get the id of the "last" inserted row.
you should use INSERT ... RETURNING ... instead.
- Evert • Apr 28, 2016
  
  Ah interesting, but I imagine that this is the same on MySQL, no? Regardless, it's a good thing to be aware of =)
  - asoki • Apr 28, 2016
    
    not tried on mysql, but i assume it should be the same as on pg
- Daniël van Eeden • May 02, 2016
  
  No RETURNING at all for MySQL. MariaDB supports RETURNING for single table DELETE.
  Feature request:
  https://jira.mariadb.org/br...
  Also note that RETURNING is not in the SQL standard afaik. To write portable SQL you should try to conform to the standard.
Rotimi Ade • May 02, 2016

Aura Sql Query ( https://packagist.org/packa... ) is a good package to use for writing portable SQL. I am currently using it in a lightweight orm package (leanorm https://packagist.org/packa... I am developing. Aura Sql Query can be used in conjunction with https://packagist.org/packa... (a package that extends PDO) and https://packagist.org/packa... (a packaged for retrieving schema meta data via PDO).
Aura Sql Query handles some of the issues you listed above such as lastInsertID, applying the right quotes in queries for each DBMS ...
It doesn't handle DDL queries though.
Quabla • May 06, 2016

You can set MySQL's sql_mode = 'PIPES_AS_CONCAT,ANSI_QUOTES' to get a more consistent handling for the quotes.
Marco Pivetta • May 09, 2016

PostgreSQL is by far the sanest database, and I would recommend everyone to
move from MySQL towards it.
I wish more people said that #feelingThePain
Lewis Cowles • Jun 16, 2016

To be honest the very effort you speak of is exactly why I started telling people not to try to support multiple data-tier back-ends in an app. It's far easier to use a service with a common protocol if multiple storage back-ends need to be left open as an option. It has the benefit of common syntax regardless of back-end; transitioning the back-end becomes almost trivial; and you can be lazy supporting one-database at a time, not diluting your time, the readability or ease of following code; or that of the team left to support.