October 07, 2008

Solar Framework's master-slave db adapter

I got my hands on a file containing OS/X line-endings (\r), which needed to be converted into Unix line-endings (\n).

SolarPHP's framework now comes with SQL adapter for MySQL replicated setups since the 1.0.0alpha2 release. The adapter automatically picks master or slave databases depending on the type of query. If its a SELECT, it will go to a slave, if its a UPDATE, DELETE, etc. it will automatically go to the master.

Let me just put it out there, I think that's a really bad idea. While this will work for certain situations, it is advertised as "The Solar_Sql_Adapter_MysqlReplicated adapter handles all the connections and switching-around for you, and you don't need change a single line of application code.".

Quite frankly, I started out with a similar approach. It worked fine for our staging environment, which didn't experience a lot of load, when it went to the live machine things went bad.

Suddenly I realized there were more than couple places in the application that had to do SELECT based on information that was just INSERT'ed. After the INSERT was done, the SELECT was performed on the slave, while the data hadn't arrived yet from the master. This caused some pretty freaky errors. Hoping to fix the problem, I decided to always execute any query on the master after an data-altering statement was done. Turned out that the master got quite a bit more load than the slaves after that change.

Things got worse when there was a replication lag of say, a second. Users would modify some data by submitting a form, and when they ended up on a resulting page the data was gone!

Some of the things we looked at was using a whitelist or blacklist for queries that are always ok/not ok to go to the slave, but considering we then ended up with decoupling a logical block of code, we went for the pragmatic approach. Always pick the master or slave depending on what is required for a specific piece of functionality. Yes, it might require a bit more thinking as you're working on your database code, but the behaviour is predictable and reliable.

Web mentions

Comments

Lukas • Oct 07, 2008
Well one way to take out the magic without soooo much work would be to just have a method to call to direct the next queries at the master until another call disables this.

For example if you make it possible to mark a transaction as "FOR UPDATE", then while that transaction is open, all queries should be directed at the master.

So probably not without change, but imho this is a quite clear approach, that allows for semi automatic work going on. Then again, making use of transactions too much (especially with MySQL) can also kill your performance. And many applications can do fine without transactions (as long as you are prepared to deal with that one in a billion freak accident).
Paul M. Jones • Oct 07, 2008
Hi Evert --

You're correct, there are times when you need to force the use of the master, not a slave. I ran into the INSERT-then-SELECT issues you noted myself.

For what it's worth, the MysqlReplicated adapter stays on the master during transactions. When you need to do modify-then-SELECT, doing it in a transaction is probably the way to with MysqlReplicated. Clearly I should have mentioned this in the blog post, but it is clearly marked in the code itself.

I hope that helps alleviate at least some of your concern about the "bad"ness of the idea. :-)
Evert • Oct 07, 2008
Lukas,

Although your point is valid I rather not rewrite mysql queries and make them, like you said, potentially slower if this is not needed.

Paul, I would say that would need a big fat warning :) Being able to manually pick the master is a good thing, but from where I'm standing, I rather have people think about it all the time and always pick it manually.. For us that's a simple getMaster vs. getSlave, I think being explicit well end up being rewarding for complex code bases :)

Evert
Paul M. Jones • Oct 07, 2008
Hi Evert --

You said, "I think being explicit well end up being rewarding for complex code bases". I admit this appeals to my bias towards more explicitness and less magic. At the same time, if I can make things work automatically, I like to do so, but that's not always possible.

Your desire for a "big fat warning" is well-received. I don't know how I would implement it, but I'll keep it in mind. Certainly a more prominent note in the documentation is warranted.

Thanks. :-)
Evert • Oct 07, 2008
good news!
Clay Loveless • Oct 13, 2008
Evert,

This discussion ignores the principal of "eventual consistency" -- which applies to MySQL replication as well as newer systems like Amazon SimpleDB.

If you're doing an insert and then a select to show the user their changes, you're doing two things that suggest re-examining the flow:

1. Making two calls. If the insert was successful, why not just show them the values that were inserted, instead of bothering the back-end with another call?

2. Assuming that replication is instant. Obviously not a safe assumption in high-traffic systems, or slower replication systems like Amazon SimpleDB.

Rather than complicate your code with selecting conditional reads (some from master, some from slave), I suggest you re-examine your whole system for tolerance of inconsistency.

For more on this topic, check out Werner Vogel's excellent post: http://www.allthingsdistributed.com/2007/12/eventually_consistent.html
Evert Pot • Oct 13, 2008
I very much agree, and that idea applies to most of our production code as well, however.. This requires your codebase to be built up from the ground like that, and just flipping the switch is bad advertising..
Evert • Oct 13, 2008
And honestly, we do have a number of spots where our code relies on selects right after writes. We're not perfect, and our codebase is a very complex one to manage.

So yea, although the 'eventual consistency' principal is a good design choice, I would still prefer the pragmatic/manual approach as the likelihood of making mistakes is quite a bit lower.

Using hashtables like amazon's simpledb is pretty much a non-option from where I'm standing. I can see putting those types of constraints on the architecture of a system will enforce linear scalability, but the added resource cost of having that constraint would make it impossible for us to hit our target deadlines ;).
William • Jan 26, 2009
Well, I don't really like the idea that "you should use the values you just inserted as values you're going to display on the screen".

If you do this, it means you can say goodbye to stored procedures and views (just to use these two as an example), where there's actually some processing done on the server to alter the data, or with a view, some more logic is applied to the selected data.

I think being able to select data from the DB after you just inserted it is a must, but manually choosing the database to which to connect for each query is problematic too.

I created a set of classes on top of Zend Framework for our company, they're a bunch of business objects mostly based on the ActiveRecord pattern (but not always). Each business object doesnt know to what specific database it's connecting, you pass it a DB handle (a DB adapter in this case) and it's using that one when it needs database access. This way, your business object is independant from the actual location of your DB, you might choose one location in one script, and another in another script. Since a business object can use a business object which in turn can use another, it's a bit problematic with this master/slave situation, and I'm still evaluating if our framework needs to be changed or if a subclassed DB adapter would be the way to go.
Mark Bembnowski • Sep 23, 2009
I have been pondering a reliable solution to this problem, as well... I wonder if there would be a reliable way to store a queue of tables that have been recently modified directly by the user in that session.

All write queries would continue to go to the master, and all read queries that touch a table that appears in this queue would also be sent to the master.

Obviously, this would require that tables be cleanly parsed from the insert/update/delete statements, and then stored in an array in the session.

I would also presume that a timestamp of some sort would be attached to the queue entry that allows entries over a certain timeout to be ignored/removed (a time period after which it should be safe to assume replication has occurred).
Mark Bembnowski • Sep 23, 2009
Just another quick note...

What my above-mentioned solution won't account for is a situation when:

- Assume that the "timeout" as defined in my previous post is 5 seconds (to allow replication to occur).

- User #1 updates a record on Table #1 (so therefore any read queries User #1 makes on Table #1 will be sent to the master for 5 seconds).

- Then User #2 updates that same record on Table #1 at 4 seconds.

- User #1 refreshes and sees the update from User #2.

- 5 seconds elapses.

- User #1 refreshes and sees that User #2's update has been seemingly reverted (because this time User #1's read on Table #1 was pulled from a not-quite-up-to-speed slave)!

- After replication has finished across all slaves, User #1 refreshes again to find that User #2's update has returned.

With this potential issue, it seems as though my "session table queue" solution will not work, either.

Perhaps MySQL Cluster is the only true option, as it seems that a write is not considered complete until the write has been replicated across all servers in the cluster.
Evert • Sep 23, 2009
One solution we use is to store the updated values in the cache at the moment we're updating mysql. This won't work for every problem, but it might help.

A more advanced solution is to use the memcache UDF for MySQL. It would allow you to create triggers that automatically update the cache.
essay writers • Apr 08, 2010
great news!