Thread: Website troubles
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Was it ever resolved exactly what happened to the website last weekend? Was there a reason it went for such a long time without being fixed? Is there a phone number or something one can use in case the webmaster(s) are not monitoring the lists? I know more or less *what* happened (php connections filled up due to using persistent connections) but not *why*. Also, would it not be a good idea to have the main page (index.html) be static only, to prevent things like this happening again? New events and news items could regenerate a static page as needed, perhaps on a hourly cron schedule. I see no reason for every access to the main page to query a news/events database for a dynamic page. IMO, this is yet another argument to "open-source" the website code: make it available by CVS, solicit feedback on a mailing list, and allow people to submit patches. - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200301290915 -----BEGIN PGP SIGNATURE----- Comment: http://www.turnstep.com/pgp.html iD8DBQE+N+KyvJuQZxSWSsgRAoXvAJ45DG7scsu3D30Dd3GX2TMghP1hUQCffWMY Lls5IO+S/21KYhFFJ4F8yxw= =ZVDD -----END PGP SIGNATURE-----
On Wed, 2003-01-29 at 09:19, Greg Sabino Mullane wrote: > Was it ever resolved exactly what happened to the website last > weekend? Read a little more news my friend! The whole internet was paralysed by a worm trying to bring down all the SQL servers of the earth. Sh§t happens... Tony Grant -- www.tgds.net Library management software toolkit, redhat linux on Sony Vaio C1XD, Dreamweaver MX with Tomcat and PostgreSQL
On Wed, 2003-01-29 at 11:16, Rogier van Eeten wrote: > > > Was it ever resolved exactly what happened to the website last > > > weekend? > > > > Read a little more news my friend! > > > > The whole internet was paralysed by a worm trying to bring down all the > > SQL servers of the earth. > > Uhm... wasn't that a mssql-worm. And the patch was out for about half a > year. So any administrator with a broken mssql wasn't quite good in his > job. And I sincerely hope that the postgresql mailinglist wasn't running > from a machine with mssql... You are right it is a MS-SQL thing but the packets flooding the internet are just that - packets. Even my other half asked me why it was taking her so long to connect to the mail server! Check the traffic reports for the weekend. theregister.co.uk was timing out on me for most of the weekend - they run Debian... Cheers Tony -- www.tgds.net Library management software toolkit, redhat linux on Sony Vaio C1XD, Dreamweaver MX with Tomcat and PostgreSQL
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 >> Was it ever resolved exactly what happened to the website last >> weekend? > > Read a little more news my friend! > > The whole internet was paralysed by a worm trying to bring down > all the SQL servers of the earth. I was well aware of the news, and saying "the whole internet" was paralyzed is bit dramatic. I was interested in learning how a worm that propagates on a port used by a Microsoft database managed to affect the website so that no free php connections were available. I am also interested in why it took so long for it to be resolved, and I would like to explore the idea of using dynamic pages only when absolutely necessary. > Sh-t happens... Thanks for the insight. - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200301291407 -----BEGIN PGP SIGNATURE----- Comment: http://www.turnstep.com/pgp.html iD8DBQE+OCbEvJuQZxSWSsgRAtA6AJ9+FgYZ5MQKcEBoR5pnNaHY94YETwCfY/Kj qJGIPZezaAu0MEgdiBlxRyE= =Ckjs -----END PGP SIGNATURE-----
On Wed, 2003-01-29 at 14:11, greg@turnstep.com wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > > >> Was it ever resolved exactly what happened to the website last > >> weekend? > > > > Read a little more news my friend! > > > > The whole internet was paralysed by a worm trying to bring down > > all the SQL servers of the earth. > > I was well aware of the news, and saying "the whole internet" was paralyzed > is bit dramatic. I was interested in learning how a worm that propagates > on a port used by a Microsoft database managed to affect the website > so that no free php connections were available. > Yes, that's absolute nonsense. A far more likely scenario was the fact that Friday afternoon slashdot posted an article about the .org cutover which started generating more traffic as people wanted to find out more about postgresql. If you read through the slashdot replies you'll note someone posting early Saturday morning he is getting the php errors. Go Occum. > I am also interested in why it took so long for it to be resolved, and I > would like to explore the idea of using dynamic pages only when absolutely > necessary. > I think these are all valid questions that i hope to see addressed. > > Sh-t happens... > > Thanks for the insight. > Well, maybe it does, but when an important news story drives new eyeballs to your website, you need something better than a bouncing $hit happens logo if you want to make a positive impression. All Greg wants to know is what caused the problem and what steps are being taken to make sure it doesn't happen again. That's hardly unreasonable. Robert Treat
On Wed, 29 Jan 2003, Robert Treat wrote: > Well, maybe it does, but when an important news story drives new > eyeballs to your website, you need something better than a bouncing $hit > happens logo if you want to make a positive impression. All Greg wants > to know is what caused the problem and what steps are being taken to > make sure it doesn't happen again. That's hardly unreasonable. The problem is/was persistent database connections ... the problem, IMHO, is that there is no way of 'timing out' idle connections, so any load on the web site that creates a whack of persistent connections, and then they all go idle, then if another hit on a different database goes through, it gets starved for connections ... I've started to disable PHPs default of allowing persistent connections, which seems to have help'd ...
On Wed, 2003-01-29 at 21:52, Marc G. Fournier wrote: > The problem is/was persistent database connections ... the problem, IMHO, > is that there is no way of 'timing out' idle connections, so any load on > the web site that creates a whack of persistent connections, and then they > all go idle, then if another hit on a different database goes through, it > gets starved for connections ... Couldn't that easily be handled by the client interface (PHP, in this case) that provides support for persistent connections? (Assuming that you're suggesting that we add support for timing out sessions to the backend -- if you're not, my apologies.) Cheers, Neil -- Neil Conway <neilc@samurai.com> || PGP Key ID: DB3C29FC
Marc G. Fournier wrote: > On Wed, 29 Jan 2003, Robert Treat wrote: > >>Well, maybe it does, but when an important news story drives new >>eyeballs to your website, you need something better than a bouncing $hit >>happens logo if you want to make a positive impression. All Greg wants >>to know is what caused the problem and what steps are being taken to >>make sure it doesn't happen again. That's hardly unreasonable. > > > The problem is/was persistent database connections ... the problem, IMHO, > is that there is no way of 'timing out' idle connections, so any load on > the web site that creates a whack of persistent connections, and then they > all go idle, then if another hit on a different database goes through, it > gets starved for connections ... > > I've started to disable PHPs default of allowing persistent connections, > which seems to have help'd ... It seems appropriate to point out a couple of things about now. The extra hits from /. only doubled the traffic for a while (easily handled), and the main traffic that hit the site was hitting the front portal pages... static data - no PHP nor database connections involved per connection. The front portal pages are static .html pages that are generated hourly from a few dynamic templates. The reason for the original error messages showing up is that all of the PHP connections (non persistent at the time) to the backend database were already used, and the main portal page couldn't create a new database connection to one of the databases to properly generate the pages. Thus, it had a case of the sads and spat out errors that were in turn frozen into the newly generated static pages (oh dear). Once we'd realised (thanks to the people that emailed us about this), we changed some things so the errors weren't frozen into the static pages any more and fired off an email to the database admin guys so they could bump up the max_connections parameter or restart Apache so that the persistent connections would all be re-established properly. Here's where the human failure problem kicked it, the majority of the database admin guys had driven about 6 hours to get to the Open Source Weekend expo in Canada where PostgreSQL was being presented, and the guy left behind to cover emergency issues was sick. Not "sick and didn't come in to work" mind you, but "sick and the medication the doctor gave him knocked him out cold mid-keystroke for about 18 hours". Not just your average case of the flu. :( We probably need to think of some way to automatically fail gracefull if the same kind of thing happens again in the future, as it's not a load bearing problem, just a configuration + human combination. But... that doesn't mean it's impossible to happen again. Regards and best wishes, Justin Clift -- "My grandfather once told me that there are two kinds of people: those who work and those who take the credit. He told me to try to be in the first group; there was less competition there." - Indira Gandhi
At 10:52 PM 1/29/03 -0400, Marc G. Fournier wrote: >The problem is/was persistent database connections ... the problem, IMHO, >is that there is no way of 'timing out' idle connections, so any load on >the web site that creates a whack of persistent connections, and then they >all go idle, then if another hit on a different database goes through, it >gets starved for connections ... Why does it get starved for connections if there are idle ones? Why can't the idle ones connect to a different DB? Also since the pages showed most of the usual info along with the error messages, I'd assume that being unable to connect to those databases isn't such a serious problem, in which case the webapp shouldn't have to display such ugliness to the user and just show as much of the usual info as possible and send the errors out of band - to the system logs or such. In some cases one could make sure the dynamic content webserver's max connection setting is < Postgresql's max backends. The static content webserver(s) could have a much higher max connection setting. Regards, Link.
On Wed, 29 Jan 2003, Marc G. Fournier wrote: > On Wed, 29 Jan 2003, Robert Treat wrote: > > > Well, maybe it does, but when an important news story drives new > > eyeballs to your website, you need something better than a bouncing $hit > > happens logo if you want to make a positive impression. All Greg wants > > to know is what caused the problem and what steps are being taken to > > make sure it doesn't happen again. That's hardly unreasonable. > > The problem is/was persistent database connections ... the problem, IMHO, > is that there is no way of 'timing out' idle connections, so any load on > the web site that creates a whack of persistent connections, and then they > all go idle, then if another hit on a different database goes through, it > gets starved for connections ... > > I've started to disable PHPs default of allowing persistent connections, > which seems to have help'd ... I've posted on this before once or twice. Basically, whatever Apache's max children is set to, postgresql to be set for a higher number of connections. since apache defaults to a much higher number, it's a problem looking to happen. If you drop the max apache children to say 64 and crank the max connections on pgsql to 128 or so, it'll work fine.
On Thu, 30 Jan 2003, Lincoln Yeoh wrote: > At 10:52 PM 1/29/03 -0400, Marc G. Fournier wrote: > > >The problem is/was persistent database connections ... the problem, IMHO, > >is that there is no way of 'timing out' idle connections, so any load on > >the web site that creates a whack of persistent connections, and then they > >all go idle, then if another hit on a different database goes through, it > >gets starved for connections ... > > Why does it get starved for connections if there are idle ones? Why can't > the idle ones connect to a different DB? It happens because php runs as a module under apache and each persistant connection is associated with an apache child / php pair. To prevent this problem, the sum of all maximum apache children for all web servers hitting a given database HAS to be lower than the max connections setting for postgresql or you will eventually, under load and at the worst possible time, experience connection starvation and have dead pages loading. It's an easy configuration change to make. But it wasn't made on the postgresql.org boxen apparently before now.
On Wed, Jan 29, 2003 at 03:48:18PM -0500, Tony Grant wrote: > On Wed, 2003-01-29 at 09:19, Greg Sabino Mullane wrote: > > > Was it ever resolved exactly what happened to the website last > > weekend? > > Read a little more news my friend! > > The whole internet was paralysed by a worm trying to bring down all the > SQL servers of the earth. Uhm... wasn't that a mssql-worm. And the patch was out for about half a year. So any administrator with a broken mssql wasn't quite good in his job. And I sincerely hope that the postgresql mailinglist wasn't running from a machine with mssql... Rogier
Rogier van Eeten wrote: <snip> > Uhm... wasn't that a mssql-worm. And the patch was out for about half a > year. So any administrator with a broken mssql wasn't quite good in his > job. And I sincerely hope that the postgresql mailinglist wasn't running > from a machine with mssql... Hi Rogier, On the subject of that worm, apparently it didn't affect just MS SQL Server, but also most (all?) products containing Microsoft Database Embedded. Analysis: SQL slammer http://www.robertgraham.com/journal/030126-sqlslammer.html "Most victims were infected through MSDE 2000, a lightweight version of SQL Server installed as part of many applications from Microsoft (e.g. Viseo) as well as 3rd parties. You might have MSDE on your desktop right now." "The problem had little to do with normal SQL Server 2000 installations." That includes: Microsoft Visio Veritas Backup Exec 9.0 McAfee Antivirus (Ha!) For a "News Story" type of thing: Worm may not hit Microsoft alone http://www.msnbc.com/news/866469.asp?cp1=1 Hope that helps. :-) Regards and best wishes, Justin Clift > Rogier -- "My grandfather once told me that there are two kinds of people: those who work and those who take the credit. He told me to try to be in the first group; there was less competition there." - Indira Gandhi