When to escape your data
Two examples of escaping data are the following:
- Before you insert a value into a SQL query, using for example mysqli::real_escape_string() or PDO::quote().
- Before you insert data into your output HTML, using htmlspecialchars().
The question I'd like to ask today is, when to do this? There are two possible moments:
- Right when the data comes in. For SQL this used to be done with 'magic quotes' quite a bit in PHP-land. In general I don't see this happening a lot anymore for SQL. I do however see data encoded using htmlentities/htmlspecialchars before entering the database.
- The other way to go about it, is to only escape when you know how you're going to use it. For example, only call htmlspecialchars right before you echo() your data into your document.
In the illustrated example, this is no big disaster. A workaround would be to call htmlspecialchars_decode() or html_entity_decode() first, and then escape for your desired output. A worse case is filtering. If you have been stripping out all, or some html tags before saving it do the database, and later on your decide you wanted to show some of them anyway, that data is now lost.
So my argument is to store raw data. Only encode right before you know where you going to need it. If you're worried about the overhead of escaping right before output in an html page, cache the output.
Whichever route you go, make sure this is clearly documented. There's 2 ways this can go wrong:
- Escaping is done on input and output. Now you see literal &'s in your html, or quotes prepended by slashes. (\'hello\').
- Escaping is forgotten at both ends. Now you might be vulnerable to SQL injection attacks, XSS attacks or data corruption.
What do you think? I'm especially interested in the other side of the argument.