Yesterday, one of our production sites began to crash at random intervals. We managed to narrow the issue down to one specific user logging in at the time, and clicking on a number of (again random) pages.

Post-mortem debugging using crashdumps and WinDbg showed the last exceptions on the stack to be (again random) and pretty minor.

The only thing they had in common was that they were unhandled, and so ended up in the Application_Error method of the Web project’s HttpApplication derived class.

So what happened ?

In the end it boils down to a feature in Internet Information Services called “Rapid Fail Protection”. If enabled (default), the application pool will stop and serve 503 Service unavailable responses when it sees X unhandled exceptions in Y minutes (both configurable).

Of course the best fix is to properly catch exceptions, however, if you ever have a case of Application Pools stopping under mysterious circumstances, check if you have Rapid Fail protection turned on.

Ruurd Keizer

Author Ruurd Keizer

Quantumphysics PhD disguised as software architect, developer, and cloud native platform greasemonkey. Analytic, pragmatic, result oriented, never forgetting the bottom line. Interested in the whole picture: from businessvalue down to the bare metal.

More posts by Ruurd Keizer
11 July 2013