Writing SQL that works on PostgreSQL, MySQL and SQLite
I am one of those crazy people who attempts to write SQL that works on SQlite, MySQL and PostgreSQL. First I should explain why:
This is all for my project sabre/dav. sabre/dav is a server for CalDAV, CardDAV and WebDAV. One of the big design goals is that it this project has to be a library first, and should be easily integratable into existing applications.
To do this effectively, it’s important that it’s largely agnostic to the host platform, and one of the best ways (in my opinion) to achieve that is to have as little dependencies as possible. Adding dependencies such as Doctrine is a great idea for applications or more opinionated frameworks, but for sabre/dav lightweight is key, and I need people to be able to understand the extension points fairly easily, without requiring them to familiarize them with the details of the dependency graph.
So while you are completely free to choose to add Doctrine or Propel yourself, the core adapters (wich function both as a default implementation and as samples for writing your own), all only depend on an instance of PDO.
The nice thing is that ORMs such as Doctrine and Propel, you can get access to the underlying PDO connection object and pass that, thus reusing your existing configuration.
For the longest time we only supported SQLite and MySQL, but I’m now working on adding PostgreSQL support. So I figured, I might as well write down my notes.
But how feasable is it to write SQL that works everywhere?
Well, it turns out that this is actually not super easy. There is such as thing as Standard SQL, but all of these databases have many of their own extensions and deviations.
The most important thing is that this will likely only work well for you if you have a very simple schema and simple queries.
Well, this blog post is not intended as a full guide, I’m just listing the particular things I’ve ran into. If you have your own, you can edit this blog post on github, or leave a comment.
My approach
- I try to keep my queries as simple as possible.
- If I can rewrite a query to work on every database, that query will have the preference.
- I avoid stored procedures, triggers, functions, views. I’m really just dealing with tables and indexes.
- Even if that means that it’s not the most optimal query. So I’m ok with sacrificing some performance, if that means my queries can stay generic, within reason.
- If there’s no possible way to do things in a generic way, I fall back on something like this:
<?php
if ($pdo->getAttribute(PDO::ATTR_DRIVER_NAME) === 'pgsql') {
$query = "...";
} else {
$query = "...';
}
$stmt = $pdo->prepare($query);
?>
DDL
First there is the “Data Definition Language” and “Data Manipulation Language”
the former is used for queries starting with CREATE
, ALTER
, DROP
, etc,
and the latter SELECT
, UPDATE
, DELETE
, INSERT
.
There really is no sane way to generalize your CREATE TABLE
queries, as the
types and syntax are vastly different.
So for those we have a set of .sql files for every server.
Quoting
In MySQL and SQlite you can use either quotes '
or double quotes "
to wrap
a string.
In PostgreSQL, you always have to use single quotes '
.
In MySQL and SQLite you use backticks for identifiers. PostgreSQL uses single quotes. SQlite can also use single quotes here if the result is unambigious, but I would strongly suggest to avoid that.
This means that this MySQL query:
SELECT * FROM `foo` WHERE `a` = "b"
is equivalent to this PostgreSQL query:
SELECT * FROM "foo" WHERE "a" = 'b'
Luckily you can often just write this query, which works for all databases:
SELECT * FROM foo WHERE a = 'b'
But keep in mind that when you create your tables, using double quotes will cause PostgreSQL to retain the upper/lower case characters. If you do not use quotes, it will normalize everything to lower case.
For compatibility I would therefore suggest to make sure that all your table and column names are in lower case.
REPLACE INTO
The REPLACE INTO
is a useful extension that is supported by both SQLite and
MySQL. The syntax is identical INSERT INTO
, except that if it runs into a
key conflict, it will overwrite the existing record instead of inserting a new
one.
So REPLACE INTO
basically either updates or inserts a new record.
This works on both SQLite and MySQL, but not PostgreSQL. Since version 9.5 PostgreSQL gained a new feature that allows you to achieve the same effect.
This statement from MySQL or SQLite:
REPLACE INTO blog (uuid, title) VALUES (:uuid, :title)
then might become something like this in PostgreSQL:
INSERT INTO blog (uuid, title) VALUES (:uuid, :title)
ON CONFLICT (uuid) DO UPDATE SET title = :title
So the major difference here is with PostgreSQL we specifically have to tell
it which key conflict we’re handling (uuid
) and what to do in that case
(UPDATE
).
In addition to REPLACE INTO
, MySQL also has this syntax to do the same thing:
INSERT INTO blog (uuid, title) VALUES (:uuid, :title)
ON DUPLICATE KEY UPDATE title = :title
But as far as I know SQLite does not have a direct equivalent.
BLOB
SQLite and MySQL have a BLOB
type. This type is used for storing data as-is.
Whatever (binary) string you store, you will retrieve again and no conversion
is attempted for different character sets.
PostgreSQL has two types that have a similar purpose: Large Objects and the bytea type.
The best way to describe large objects, is that they are stored ‘separate’ from the table, and instead of inserting the object itself, you store a reference to the object (in the form of an id).
bytea
is more similar to BLOB
, so I opted to use that. But there are some
differences.
First, if you do a select such as this:
<?php
$stmt = $pdo->prepare('SELECT myblob FROM binaries WHERE id = :id');
$stmt->execute(['id' => $id]);
echo $stmt->fetchColumn();
?>
On MySQL and Sqlite this will just work. The myblob
field is represented as
a string.
On PostgreSQL, byta
is represented as a PHP stream. So you might have to
rewrite that last statement as:
<?php
echo stream_get_contents($stmt->fetchColumn());
?>
Or:
<?php
stream_copy_to_stream($stmt->fetchColumn(), STDOUT);
?>
Luckily in sabre/dav we pretty much support streams where we also support strings, so we were already agnositic to this, but some unittests had to be adjusted.
Inserting bytea
is also a bit different. I’m not a fan of of using
PDOStatement::bindValue
and PDOStatement::bindParam
, instead
I prefer to just send all my bound parameters at once using execute
:
<?php
$stmt = $pdo->prepare('INSERT INTO binaries (myblob) (:myblob)');
$stmt->execute([
'myblob' => $blob
]);
?>
While that works for PostgreSQL for some strings, it will throw errors when you give it data that’s invalid in the current character set. It’s also dangerous, as PostgreSQL might try to transcode the data into a different character set.
If you truly need to store binary data (like I do) you must do this:
<?php
$stmt = $pdo->prepare('INSERT INTO binaries (myblob) (:myblob)');
$stmt->bindParam('myblob', $blob, PDO::PARAM_LOB);
$stmt->execute();
?>
Luckily this also just works in SQlite and MySQL.
String concatenation
Standard SQL has a string concatenation operator. It works like this:
SELECT 'foo' || 'bar'
// Output: foobar
This works in PostgreSQL and Sqlite. MySQL has a function for this:
SELECT CONCAT('foo', 'bar')
PostgreSQL also has this function, but SQLite does not. You can enable Standard SQL concatenation in MySQL by enabling it:
SET SESSION sql_mode = 'PIPES_AS_CONCAT'
I’m not sure why this isn’t the default.
Last insert ID
The PDO
object has a lastInsertId()
function. For SQLite and MySQL you can
just call it as such:
<?php
$id = $pdo->lastInsertId();
?>
However, PostgreSQL requires an explicit sequence identifier. By default this
follows the format tablename_idfield_seq
, so we might specifiy as this:
<?php
$id = $pdo->lastInsertId('articles_id_seq');
?>
Luckily the parameter gets ignored by SQLite and MySQL, so we can just specify it all the time.
Type casting
If you have an INT
field (or similar) and you access it in this way:
<?php
$result = $pdo->query('SELECT id FROM articles');
$id = $result->fetchColumn();
?>
With PostgreSQL $id
will actually have the type php type integer
. If you
use MySQL or SQlite, everything gets cast to a php string
, which is
unfortunate.
The sane thing to do is to cast everything to int after the fact, so you can correctly do PHP 7 strict typing with these in the future.
Testing
I unittest my database code. Yep, you read that right! I’m one of those people. It’s been tremendously useful.
Since adding PostgreSQL I was able to come up with a nice structure. Every unittest that does something with PDO now generally looks like this:
<?php
abstract PDOTest extends \PHPUnit_Framework_TestCase {
abstract function getPDO();
/** all the unittests go here **/
}
?>
Then I create one subclass for PostgreSQL, Sqlite and MySQL that each only
implement the getPDO()
function.
This way all my tests are repeated for each driver.
I’ve also rigged up Travis CI to have a MySQL and a PostgreSQL database server running, so everything automatically gets checked every time.
If a developer is testing locally, we detect if a database server is running, and automatically just skip the tests if this was not the case. In most cases this means only the Sqlite tests get hit, which is fine.
Conclusions
- Created a monster.
- PostgreSQL is by far the sanest database, and I would recommend everyone to move from MySQL towards it.
Comments
asoki •
hi Evert, thanks for the interesting article.
just one comment on the lastInsertId. At least on postgresql, lastInsertId will return a wrong id when the insert command creates more than one rows in the table. For example when some after trigger inserts an additional row you will get the id of the "last" inserted row.
you should use INSERT ... RETURNING ... instead.
Evert •
Ah interesting, but I imagine that this is the same on MySQL, no? Regardless, it's a good thing to be aware of =)
asoki •
not tried on mysql, but i assume it should be the same as on pg
Daniël van Eeden •
No RETURNING at all for MySQL. MariaDB supports RETURNING for single table DELETE.
Feature request:
https://jira.mariadb.org/br...
Also note that RETURNING is not in the SQL standard afaik. To write portable SQL you should try to conform to the standard.
Rotimi Ade •
Aura Sql Query ( https://packagist.org/packa... ) is a good package to use for writing portable SQL. I am currently using it in a lightweight orm package (leanorm https://packagist.org/packa... I am developing. Aura Sql Query can be used in conjunction with https://packagist.org/packa... (a package that extends PDO) and https://packagist.org/packa... (a packaged for retrieving schema meta data via PDO).
Aura Sql Query handles some of the issues you listed above such as lastInsertID, applying the right quotes in queries for each DBMS ...
It doesn't handle DDL queries though.
Quabla •
You can set MySQL's sql_mode = 'PIPES_AS_CONCAT,ANSI_QUOTES' to get a more consistent handling for the quotes.
Marco Pivetta •
I wish more people said that #feelingThePain
Lewis Cowles •
To be honest the very effort you speak of is exactly why I started telling people not to try to support multiple data-tier back-ends in an app. It's far easier to use a service with a common protocol if multiple storage back-ends need to be left open as an option. It has the benefit of common syntax regardless of back-end; transitioning the back-end becomes almost trivial; and you can be lazy supporting one-database at a time, not diluting your time, the readability or ease of following code; or that of the team left to support.