Knex (with MySQL) had a very scary SQL injection
Knex recently released a new version this week (2.4.0). Before this version, Knex had a pretty scary SQL injection. Knex currently has 1.3 million weekly downloads and is quite popular.
The security bug is probably one of the worst SQL injections I’ve seen in recent memory, especially considering the scope and popularity.
If you want to get straight to the details:
- Check out the Github issue, which was opened 7 years ago(!)
- An article from Ghostccamm explaining the vulnerability.
- CVE-2016-20018.
My understanding of this bug
If I understand the vulnerability correctly, I feel this can impact a very large number of sites using Knex. Even more so if you use Express.
I’ll try to explain through a simple example. Say, you have MySQL table structured like this:
CREATE TABLE `users` (
`id` int NOT NULL AUTO_INCREMENT,
`name` varchar(100) DEFAULT NULL,
PRIMARY KEY (`id`)
)
And you have a query that does a SELECT
using Knex:
const lookupId = 2;
const result = await knex('users')
.select(['id', 'name'])
.where({
id: lookupId
});
You’d expect the query to end up roughly like this
SELECT `id`, `name` FROM `users` WHERE `id` = 2
The issue is when the user controls the value of lookupId
. If somehow they
can turn this into an object like this:
const lookupId = {
name: 'foo'
}
You might expect an error from Knex, but instead it generates the following query:
SELECT `id`, `name` FROM `users` WHERE `id` = `name` = 'foo'
This query is not invalid. I don’t fully understand fully understand MySQL’s behavior, but it causes the WHERE clause to be ignored and the result is equivalent to:
SELECT `id`, `name` FROM `users`
I think it has something to do with how MySQL casts things. In MySQL this yields true
(or 1
):
SELECT 1 = 'foo' = 'bar'
Query strings and Express
One place where this is especially scary, is if lookupId
was provided by a user,
via a JSON body or query string.
Consider this example from an Express route:
app.get('/', async (req, res) => {
const result = await knex('users')
.select(['id', 'name'])
.where({id: req.query.id});
res.send(result)
})
Here the id
in the WHERE
clause is provided by the URL query string:
If a the server is opened with something like:
http://localhost:3000/?id=2
It will return a single user, but if it was opened using:
http://localhost:3000/?id%5Bname%5D=foo
It will return every user. And this issue is not limited to SELECT
,
it could also trigger in a WHERE
clause in DELETE
.
The reason this works is that by crafting URL query parameters in a special
way, a user can have things in req.query
show up as objects or arrays.
Express calls this the ‘extended’ syntax and turns it on by default. In my opinion this is a bad default because it’s not really what users expect will happen. PHP does this as well, and I believe this may have been where Express (or specifically the qs library) got the syntax from.
This is why I think Express users are expecially likely to be vulnerable.
I think that most developers in professional settings are decent at
validating JSON request bodies with a variety of tools, but I’ve noticed
this is often not the case for query parameters. I believe a big reason for
this is that the ‘extended’ syntax is not well known and ‘surprising’ behavior.
Many people assume these are just strings. This is not intended as a plug,
but this is also why the default behavior in our Curveball framework is
to not support this. In most cases it’s not needed, and if you do want it you can
use qs
and explicitly opt-in.
Other examples
But of course, this issue is not limited to query strings. If you’re not
validating input somewhere and this ends up in a .where()
statement there’s
risk.
A specific example is a situation where part of the contents of your WHERE
clause is unguessable.
For instance, say you had a database table with records that users can only see if they know a secret code, and it’s selected with:
SELECT * FROM coupons WHERE product_id = 5 AND coupon_code = 'BIG_OOF2023'
A user presumably would have to know the coupon_code
to see the coupons,
but if there’s no code to validate coupon_code
is a string, a user would have the
power to change the query to this:
SELECT * FROM coupons WHERE product_id = 5 AND coupon_code = product_id = 'bla'
Which is equivalent to:
SELECT * FROM coupons WHERE product_id = 5
And thus no longer requiring coupon codes to get ALL the discounts.
Lastly, if you can rewrite a query that expects 1 result to return every record in a database is also an easy way to cause a Denial of service attack.
This does demonstrate again how is critical validating is and throw errors whenever you get data you don’t expect. Even if under normal circumstances nothing weird can happen, a library you use might do the wrong thing with unexpected data instead of just rejecting it.
On Knex
It’s quite unfortunate to see that this went unpatched for so long. I’d invite anyone reading this to try to not pile on on the (likely volunteer) authors but think about the larger ecosystem.
This bug was hidden in plain sight. Lots of people must have randomly ran into it and a ticket was opened. Probably the most responsible thing to do would have been what @rgmz did: do their best to contact the authors, failing that contact Github Security Team. After the Github Security Team also weren’t able to connect to the authors, make a CVE which puts this on every everyone’s radar. This ultimately led to a random bystander make their first contribution and submit a fix.
Knex feels high risk now though, and I can’t help wondering what else might be unpatched. I’ve only recently made the jump from just writing my own queries for like 20 years to Knex, but I’m probably reversing that decision for my next project.