An assortment of indigestible things

Are you hitting MaxClients without even knowing it?

640px-Brick_WallUnless you run a busy website, Apache‘s MaxClients setting probably isn’t something you think about very often. If not, then look in your Apache 2.2 config and you’ll find a block like this1


    StartServers              5
    MinSpareServers          10
    MaxSpareServers          30
    MaxClients              150
    MaxRequestsPerChild   10000

This setting, along with ServerLimit, controls the number of simultaneous connections that Apache can handle. Above this limit, connections are queued until slots become free. Apache will tell you about this with a message in its error log that looks like—

[Sun Dec 21 01:35:59 2014] [error] server reached MaxClients setting, consider raising the MaxClients setting

Like many a naïve sysadmin, I unwittingly assumed that serious problems usually look serious in log files: if one of your disks starts to fail, you’ll be reading those kernel messages for the rest of the day. The problem I faced was that Nagios occasionally thought the website was down, but there didn’t seem to be any reason for it, and generally it would recover spontaneously (although sometimes this would take some time). I did find that message in the error log, but it was hours before the problem occurred, so I assumed it was just a weird transient caused by the server restarting (which it does around that time of day).

Our diligent on-call developer included in his report that netstat reported an unusual number of sockets in the TIME_WAIT state. This indicates a large number of TCP connections that closed in the recent past. On a hunch I went to look at the source code that generates that log message, and found this:

static int reported = 0;

if (!reported) {
    ap_log_error(APLOG_MARK, APLOG_ERR, 0, ap_server_conf,
                "server reached MaxClients setting, consider"
                " raising the MaxClients setting");
    reported = 1;
}

So that message will only be generated once per server lifetime. Who knows how many times I’d been hitting that limit! Kibana gave me a clue: there were several days in the last month that the message appeared in the log. I increased the limits today—thankfully the server has plenty of resources available—and hopefully that’s the end of that chapter.

Lessons learned

  • Read logs! I’m so embarrassed by this as I’m always harping on about the importance of reading logs.
  • Alert on logged errors. Clearly this is something I should be doing, but (again, embarrassingly) I’m not. The infrastructure is there—I should know, I put it there!—but it’s an all-too-gaping chink in my armour. Oh well, it’ll be my first job for 2015!

It would have been nice if the Apache documentation mentioned this about the error message, but that’s no excuse: the evidence was there, but I just wasn’t looking for it.

1 The name of the block depends on which MPM you’re using, but the MaxClients setting (and the code excerpt) is the same for all MPMs.

Previous

Upgrading to SHA-256? Some XP users will think your site is down

Next

Just take my money and walk away

1 Comment

  1. Wow, I never knew that the error message would be logged only once per server lifetime. And I was wondering why this message is logged only once even if I let the server run for a long time but if I do restart the Apache instance, the error pops up after couple of hours. Thank you for digging into this.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Powered by WordPress & Theme by Anders Norén