Recovery failed on a backup with " lock AccessShareLock on object 16477/244169/0 is already held" - Mailing list pgsql-bugs

From John Smith
Subject Recovery failed on a backup with " lock AccessShareLock on object 16477/244169/0 is already held"
Date
Msg-id b88f0d670806301040w6f7e8e61x9e8e61f7540b7480@mail.gmail.com
Whole thread Raw
Responses Re: Recovery failed on a backup with " lock AccessShareLock on object 16477/244169/0 is already held"
List pgsql-bugs
Hi,

I hit an issue running PG 8.2.3 with the continuous archiving feature
where I was unable to recover from the backup.  I was wondering if
this may be related to bug #3245?

These are the steps that occurred before I saw this problem:

1.  Prepare transaction.
2.  A base backup of the database was taken to a warm standby system.
3.  Commit prepared.  The commit prepared never finished as it hit a PANIC:

2008-06-17 23:53:53.206 Local time zone must be set--see zic manual
page PANIC:  failed to re-find shared lock object
2008-06-17 23:53:53.207 Local time zone must be set--see zic manual
page STATEMENT:  commit prepared '148969' ;


I believe this panic is probably bug #3245 based on the description of
that bug - http://archives.postgresql.org/pgsql-bugs/2007-04/msg00075.php

At this point I attempted to do a recovery using the continuous
archive backup on the warm standby system.  Instead of recovering
correctly it encountered this FATAL error where a AccessSharedLock was
already held.

2008-06-18 00:05:34.045 Local time zone must be set--see zic manual
page LOG:  database system was interrupted at 2008-06-17 23:53:16
Local time zone must be set--see zic manual page
2008-06-18 00:05:34.077 Local time zone must be set--see zic manual
page LOG:  checkpoint record is at 70/E600DC18
2008-06-18 00:05:34.077 Local time zone must be set--see zic manual
page LOG:  redo record is at 70/E600DC18; undo record is at 0/0;
shutdown FALSE
2008-06-18 00:05:34.077 Local time zone must be set--see zic manual
page LOG:  next transaction ID: 0/1099178; next OID: 413234
2008-06-18 00:05:34.077 Local time zone must be set--see zic manual
page LOG:  next MultiXactId: 1; next MultiXactOffset: 0
2008-06-18 00:05:34.077 Local time zone must be set--see zic manual
page LOG:  database system was not properly shut down; automatic
recovery in progress
2008-06-18 00:05:34.105 Local time zone must be set--see zic manual
page LOG:  redo starts at 70/E600DC68
2008-06-18 00:05:34.106 Local time zone must be set--see zic manual
page LOG:  could not open file "pg_xlog/0000000100000070000000E7" (log
file 112, segment 231): No such file or directory
2008-06-18 00:05:34.106 Local time zone must be set--see zic manual
page LOG:  redo done at 70/E600DC68
2008-06-18 00:05:34.293 Local time zone must be set--see zic manual
page LOG:  recovering prepared transaction 1099169
2008-06-18 00:05:34.293 Local time zone must be set--see zic manual
page LOG:  recovering prepared transaction 1099156
2008-06-18 00:05:34.293 Local time zone must be set--see zic manual
page LOG:  recovering prepared transaction 1099157
2008-06-18 00:05:34.293 Local time zone must be set--see zic manual
page LOG:  recovering prepared transaction 1099161
2008-06-18 00:05:34.293 Local time zone must be set--see zic manual
page LOG:  recovering prepared transaction 1099164
2008-06-18 00:05:34.293 Local time zone must be set--see zic manual
page LOG:  recovering prepared transaction 1099162
2008-06-18 00:05:34.293 Local time zone must be set--see zic manual
page LOG:  recovering prepared transaction 1099166
2008-06-18 00:05:34.294 Local time zone must be set--see zic manual
page LOG:  recovering prepared transaction 1099131
2008-06-18 00:05:34.298 Local time zone must be set--see zic manual
page FATAL:  lock AccessShareLock on object 16477/244169/0 is already
held
2008-06-18 00:05:34.299 Local time zone must be set--see zic manual
page LOG:  startup process (PID 17377) exited with exit code 1
2008-06-18 00:05:34.299 Local time zone must be set--see zic manual
page LOG:  aborting startup due to startup process failure


Is this FATAL error seen on recovery a different bug or is it just a
direct result of bug #3245?

Unfortunately I do not have a way to deterministically reproduce this
problem but I have seen it 3 times so far.

thanks,

John

pgsql-bugs by date:

Previous
From: Valentin Bogdanov
Date:
Subject: psql: FATAL: the database system is starting up
Next
From: Tom Lane
Date:
Subject: Re: Recovery failed on a backup with " lock AccessShareLock on object 16477/244169/0 is already held"