Strange deadlock error last night - Mailing list pgsql-admin

From Scott Whitney
Subject Strange deadlock error last night
Date
Msg-id 20090113173555.80E667E4FD8@mail.int.journyx.com
Whole thread Raw
Responses Re: Strange deadlock error last night  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Strange deadlock error last night  ("Scott Marlowe" <scott.marlowe@gmail.com>)
List pgsql-admin
I'm about to begin researching this, but I thought I'd throw this out there
and see if there are any quick responses.

Last night, I got this:

Jan 13 03:31:28 db01 postgres[23537]: [140-2] DETAIL:  Process 23537 waits
for AccessShareLock on relation 1260 of database 0; blocked by process
22228.
Jan 13 03:31:28 db01 postgres[23537]: [140-3]   Process 22228 waits for
AccessShareLock on relation 1262 of database 0; blocked by process 6816.
Jan 13 03:31:28 db01 postgres[23537]: [140-4]   Process 6816 waits for
AccessShareLock on relation 1260 of database 0; blocked by process 14624.
Jan 13 03:31:28 db01 postgres[23537]: [140-5]   Process 14624 waits for
AccessExclusiveLock on relation 1260 of database 0; blocked by process
23537.
Jan 13 03:31:28 db01 postgres[14624]: [243-1] ERROR:  deadlock detected
Jan 13 03:31:28 db01 postgres[14624]: [243-2] DETAIL:  Process 14624 waits
for AccessExclusiveLock on relation 1260 of database 0; blocked by process
22228.
Jan 13 03:31:28 db01 postgres[14624]: [243-3]   Process 22228 waits for
AccessShareLock on relation 1262 of database 0; blocked by process 6816.
Jan 13 03:31:28 db01 postgres[14624]: [243-4]   Process 6816 waits for
AccessShareLock on relation 1260 of database 0; blocked by process 14624.
Jan 13 03:31:28 db01 postgres[6816]: [54-1] ERROR:  deadlock detected
Jan 13 03:31:29 db01 postgres[6816]: [54-2] DETAIL:  Process 6816 waits for
AccessShareLock on relation 1260 of database 0; blocked by process 22228.
Jan 13 03:31:29 db01 postgres[6816]: [54-3]     Process 22228 waits for
AccessShareLock on relation 1262 of database 0; blocked by process 6816.

Here's what the procs were doing at the time, according to the logs:

** Jan 12 03:32:08 db01 postgres[22228]: [279-1] WARNING:  skipping
"pg_database" --- only table or database owner can vacuum it
** Jan 12 03:33:42 db01 postgres[14624]: [242-1] WARNING:  skipping
"pg_database" --- only table or database owner can vacuum it
Jan 13 03:30:36 db01 postgres[23537]: [139-1] WARNING:  skipping "pg_authid"
--- only table or database owner can vacuum it
Jan 13 03:31:25 db01 postgres[6816]: [53-1] WARNING:  skipping
"pg_auth_members" --- only table or database owner can vacuum it

** Note that procs 14624 and 22228 do not appear on the same day at the same
time. I have no record of any pg logging of those processes other than what
you see above. That, unfortunately, doesn't necessarily tell me anything.

It ended up locking up about 250 customer databases until I restarted the
postmaster. This is version 8.1.4. Upgrading right now (even to a minor rev)
is not really an option. This box has been up and running for 306 days. This
postgres level has been installed for..err...well...at least Aug 9, 2006,
based on some dates in the directories.

I can't determine what "database 0" is. It's not a datid. I'm assuming it's
a relative number based on what was going on at the time. We're running
auto-vacuum, as we have been for years. A full vacuum happens every Saturday
night.

Any clever ideas? Requests for more information?

Thanks in advance!

Scott Whitney
Journyx, Inc.


pgsql-admin by date:

Previous
From: "Scott Marlowe"
Date:
Subject: Re: PGOPTIONS and default_tablespace
Next
From: Tom Lane
Date:
Subject: Re: Strange deadlock error last night