January 25, 2009

Apache speed and reverse proxies

In our environment we use Apache everywhere. It's PHP integration has so far proven superiour. Now we're dealing with higher loads and we've hit some limitations.

One of the problems we had, is Apache's heaviness. Our apache2 worker processes eat up around 20 Megabytes of memory, and with 3 GB of memory will bring us up to a setting of around 150 MaxClients. Rasmus seems to think that's a pretty high setting, but based off the easy calculation (memory available for apache / size of an apache process) it works out for us.

Effectively this means we can serve approximately this much parallel request on this machine. It is therefore in our greatest benefit to get every response out as quickly as possible, increasing the amount of requests we can handle per second.

Going beyond this 150 number could cause Linux to start using swap. This is bad, because it will add latency to the response, which in turn will result connections staying open longer.

Since we're sending everything over the web, there is a standard latency. Information traveling to the other side of the globe will at least take 67ms because we're restricted to the speed of light. This doesn't even take non-direct routes nor other hardware latency into account. According to Till this all adds up to the time a single Apache process takes up before working on the next request.

The reverse proxy

There are a couple of webservers which seem to be optimized for serving lots of clients. Lighttpd got a lot of traction earlier, but the project seems to have slowed down a lot as the much anticipated 1.5 release has been under development for almost 2 years. nginx seems have taken it's place in terms of disruptiveness. These servers are much more lightweight, and are supposed to be faster in delivery of static files.

Much like Till, we've had issues hooking PHP directly into these servers. Till suggests the solution of actually placing nginx in front of Apache (on the same machine) as a reverse proxy. Nginx takes care of serving static files and proxies any PHP request to Apache. The concept is that Apache can push out the response as quickly as possible, and while Nginx is working on delivering it to the (slow) client Apache can take on other work.

The thing that bothers me with this setup, is that the need for 2 webserver products to achieve a single task. This implies that neither of them is adequate on it's own to do the job.

On the other hand, this type of setup is also what a lot of people seem to be doing by placing Squid in front of their webservers, although that tends to happen on separate hardware.

HTTP/1.1 100 Continue

All of a sudden we noticed a problem we saw earlier with Lighttpd (Bug #1017) was also an issue in nginx (couldn't find bug or bug tracker at all). Neither of them seems to support the Expect: 100-continue header. While no browser actually sends these headers, we have webservices running which are directly accessed by other types of HTTP clients. Losing support for this HTTP functionality would instantly break their applications, which is unacceptable.

So now we're actually looking at Squid for performing that task. Squid is powerful and well tested. We're going to start load testing this reasonably soon, and I have no problems reporting back here if people are interested in numbers. I'm wondering if there's other people who have tried a similar setup or if there's better ways to approach this problem.

Web mentions

Comments

guest • Jan 25, 2009
nginx supports Expect: continue in 0.7.x branch

also did you try php-fpm ?
daaku • Jan 25, 2009
I started looking for Squid with Apache/PHP in the back info and got here - would be interesting to see what you see performance wise. Is this a highly dynamic site?
patrick • Jan 25, 2009
Did you have a look at varnish yet (http://varnish.projects.linpro.no/)? Its a state of the art HTTP accelerator and much superior to squid. Its performance is just outstanding. We just did some tests on a eZ publish (http://ez.no) installation and while the basic (not optimized) apache setup served about 15 requests/second the varnish/apache setup was able to deliver 2500 requests (the network card was the bottleneck than). Of course configuration of varnish can be a bit tricky for more complex applications, but should be definitly worth having a look at it..
till • Jan 25, 2009
Squid is a heavy weight compared to nginx. There's gotta be a reason though why so many people rely on it. ;-) E.g. wikipedia use a lot of Squid.

I'm curious...

a) Have you explored php-cgi (with nginx)?
b) Your overall Squid performance.
c) Can you detail when you use HTTP/1.1 100 Continue?
Evert • Jan 25, 2009
Guest,

Good to know they will support it in the future.. Also PHP-FPM looks interesting, but I don't like that there's not a lot of english docs. Not to say english is better or anything, but I can read that :)

Daaku,

Very much so.. We do a lot around social networking, so you can imagine we almost always need to make sure people see (at least their own) updates.

Patrick,

Varnish looks very cool! I love that it supports ESI as well, which could provide an upgrade path when we need to get on CDN's. Thanks very much for the link!

Till,

a) I have not yet, but this is a path I want to give a shot.
b) I definitely will when I have numbers!
c) I personally don't use it, but I noticed some of our clients use an HTTP client which uses this. Although I'm not 100% sure, I believe they build it using .NET.
Guest • Jan 25, 2009
Evert,

nginx 0.7 is stable enough, it is widely used in production already. And as I can see from changelogs, the Expect fix has been backported to 0.6.34.

About php-fpm, well, the russian text mostly describes advantages of fastcgi and additional features; http://php-fpm.anight.org/docs.html and the sample config are enough to start. And you are always welcome to ask questions in the highload-php-en google group.
Evert • Jan 25, 2009
Guest,

PHP-FPM definitely sparks my interest.

One question.. Did you guys try or plan on getting the patches back into the main PHP source?

Evert
Guest • Jan 25, 2009
Evert,

Anight, the developer of php-fpm, says that he does not want it to be merged until all the planned features are implemented and tested, so he intentionally releases it under an incompatible license (GPL) until he decides it's ready.
Tim Lieberman • Jan 25, 2009
Are you running php inside apache via mod_php? If so, you might be able to do a lot better by moving to FastCGI, according to Anthony Ferrara:

(http://www.joomlaperformance.com/articles/webcasts/why_mod_php_is_bad_for_performance_52_58.html)
Evert • Jan 25, 2009
Interesting Tim, I'll definitely check out that presentation. We do indeed use mod_php, because my experiences with PHP through FastCGI has been less than smooth.

Guest,

That's actually a bit scary! Maybe I'm completely missing the point, but I think there's a much much higher chance of acceptance to the PHP source if this stuff was done using many smaller fixes/patches vs. one big patch!
Stuart Herbert • Jan 26, 2009
Assuming you're running Linux on the web servers, are you using the default Linux TCP stack parameters, or have you tuned them to perform better under heavy load?
Sebs • Jan 26, 2009
We use exactly that combination of tools for a rack of 15 webservers serving millions of users and it works quite well.
Both, Apache and Squid might not fullfill someones thrive for new technology. But they are heavily documented and rock solid.

Its worth all the time you invest in it. You can run a squid proxy on localhost, which means one squid per webserver. No Extra hardware needed and surely worth the money.
The beauty of it shines through when you tail squids access logs and see which requests do not even require the indian. ;)
Padraic Brady • Jan 28, 2009
"The concept is that Apache can push out the response as quickly as possible, and while Nginx is working on delivering it to the (slow) client Apache can take on other work. The thing that bothers me with this setup, is that the need for 2 webserver products to achieve a single task. This implies that neither of them is adequate on it's own to do the job."

I think the main logic really is that nginx is just faster, especially in a situtation where applications are capable of caching themselves. At that point Squid becomes just another user of RAM that adds less of a benefit than using it in front of a non-caching application. And there is, without doubt, a scaling element. Squid is fantastic when scaling, but nginx can easily beat it at a smaller scale.

It's a contest over resources, not functionality. Apache + Squid are heavyweights, Nginx + Application Caching are lightweights. Depending on your resources the winning strategy can go either way or simply tie.

Someday though, someone will actually benchmark the two! At the moment it's largely a matter of opinion as to which works best in any one scenario.

I definitely need to try out php-fpm!!! :)