Re: Website troubles - Mailing list pgsql-general
From | Justin Clift |
---|---|
Subject | Re: Website troubles |
Date | |
Msg-id | 3E38ADBF.6070003@postgresql.org Whole thread Raw |
In response to | Re: Website troubles ("Marc G. Fournier" <scrappy@hub.org>) |
List | pgsql-general |
Marc G. Fournier wrote: > On Wed, 29 Jan 2003, Robert Treat wrote: > >>Well, maybe it does, but when an important news story drives new >>eyeballs to your website, you need something better than a bouncing $hit >>happens logo if you want to make a positive impression. All Greg wants >>to know is what caused the problem and what steps are being taken to >>make sure it doesn't happen again. That's hardly unreasonable. > > > The problem is/was persistent database connections ... the problem, IMHO, > is that there is no way of 'timing out' idle connections, so any load on > the web site that creates a whack of persistent connections, and then they > all go idle, then if another hit on a different database goes through, it > gets starved for connections ... > > I've started to disable PHPs default of allowing persistent connections, > which seems to have help'd ... It seems appropriate to point out a couple of things about now. The extra hits from /. only doubled the traffic for a while (easily handled), and the main traffic that hit the site was hitting the front portal pages... static data - no PHP nor database connections involved per connection. The front portal pages are static .html pages that are generated hourly from a few dynamic templates. The reason for the original error messages showing up is that all of the PHP connections (non persistent at the time) to the backend database were already used, and the main portal page couldn't create a new database connection to one of the databases to properly generate the pages. Thus, it had a case of the sads and spat out errors that were in turn frozen into the newly generated static pages (oh dear). Once we'd realised (thanks to the people that emailed us about this), we changed some things so the errors weren't frozen into the static pages any more and fired off an email to the database admin guys so they could bump up the max_connections parameter or restart Apache so that the persistent connections would all be re-established properly. Here's where the human failure problem kicked it, the majority of the database admin guys had driven about 6 hours to get to the Open Source Weekend expo in Canada where PostgreSQL was being presented, and the guy left behind to cover emergency issues was sick. Not "sick and didn't come in to work" mind you, but "sick and the medication the doctor gave him knocked him out cold mid-keystroke for about 18 hours". Not just your average case of the flu. :( We probably need to think of some way to automatically fail gracefull if the same kind of thing happens again in the future, as it's not a load bearing problem, just a configuration + human combination. But... that doesn't mean it's impossible to happen again. Regards and best wishes, Justin Clift -- "My grandfather once told me that there are two kinds of people: those who work and those who take the credit. He told me to try to be in the first group; there was less competition there." - Indira Gandhi
pgsql-general by date: