Re: Strange deadlock error last night - Mailing list pgsql-admin

From Scott Whitney
Subject Re: Strange deadlock error last night
Date
Msg-id 20090113223407.BDF247E46BA@mail.int.journyx.com
Whole thread Raw
In response to Re: Strange deadlock error last night  ("Scott Marlowe" <scott.marlowe@gmail.com>)
Responses Re: Strange deadlock error last night  ("Scott Marlowe" <scott.marlowe@gmail.com>)
List pgsql-admin
Thanks for all the information, guys. I think Tom was right. Our application
was doing a couple of full vacs at the same time. It's weird that we didn't
run into this in the past.

You're all absolutely right about the upgrading, but in our environment,
it's not 2-3 minutes. It's 2-3 weeks. I've got to fully vet the app on the
platform internally with full test plans, etc, even for the most minor
upgrades; corp policy.

Right now, my effort is in going to the latest stable branch. Moving
forward, I will use these notes to get the company to revisit the minor
upgrade policy, though.

After all, when I _do_ get hit by one of those bugs, I _will_ be asked why
we weren't upgraded. *sigh*



-----Original Message-----
From: Scott Marlowe [mailto:scott.marlowe@gmail.com]
Sent: Tuesday, January 13, 2009 4:16 PM
To: Scott Whitney
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Strange deadlock error last night

On Tue, Jan 13, 2009 at 10:37 AM, Scott Whitney <swhitney@journyx.com>
wrote:

> It ended up locking up about 250 customer databases until I restarted the
> postmaster. This is version 8.1.4. Upgrading right now (even to a minor
rev)
> is not really an option. This box has been up and running for 306 days.
This
> postgres level has been installed for..err...well...at least Aug 9, 2006,
> based on some dates in the directories.

You need to ask yourself how much downtime you can afford.  The 2 or 3
minutes every few months to go from 8.1.x to 8.1.x+1, or the half a
day of downtime when some horrendous bug takes down the whole site
because you didn't update it.  Seriously, that unfozen template0 bug
that Alvarro mentioned is one of those kinds of bugs.

Nothing like your db going down in the middle of the day with an error
message that it's going down to prevent txid wraparound induced loss,
please run vacuum on all your databases in single user mode.

If you can't find set aside a minute or two at 0200 hrs, then don't be
surprised when you get one of those failures.


pgsql-admin by date:

Previous
From: "Scott Marlowe"
Date:
Subject: Re: Strange deadlock error last night
Next
From: "Scott Marlowe"
Date:
Subject: Re: Strange deadlock error last night