Thread: Recovery failed on a backup with " lock AccessShareLock on object 16477/244169/0 is already held"

Hi,

I hit an issue running PG 8.2.3 with the continuous archiving feature
where I was unable to recover from the backup.  I was wondering if
this may be related to bug #3245?

These are the steps that occurred before I saw this problem:

1.  Prepare transaction.
2.  A base backup of the database was taken to a warm standby system.
3.  Commit prepared.  The commit prepared never finished as it hit a PANIC:

2008-06-17 23:53:53.206 Local time zone must be set--see zic manual
page PANIC:  failed to re-find shared lock object
2008-06-17 23:53:53.207 Local time zone must be set--see zic manual
page STATEMENT:  commit prepared '148969' ;


I believe this panic is probably bug #3245 based on the description of
that bug - http://archives.postgresql.org/pgsql-bugs/2007-04/msg00075.php

At this point I attempted to do a recovery using the continuous
archive backup on the warm standby system.  Instead of recovering
correctly it encountered this FATAL error where a AccessSharedLock was
already held.

2008-06-18 00:05:34.045 Local time zone must be set--see zic manual
page LOG:  database system was interrupted at 2008-06-17 23:53:16
Local time zone must be set--see zic manual page
2008-06-18 00:05:34.077 Local time zone must be set--see zic manual
page LOG:  checkpoint record is at 70/E600DC18
2008-06-18 00:05:34.077 Local time zone must be set--see zic manual
page LOG:  redo record is at 70/E600DC18; undo record is at 0/0;
shutdown FALSE
2008-06-18 00:05:34.077 Local time zone must be set--see zic manual
page LOG:  next transaction ID: 0/1099178; next OID: 413234
2008-06-18 00:05:34.077 Local time zone must be set--see zic manual
page LOG:  next MultiXactId: 1; next MultiXactOffset: 0
2008-06-18 00:05:34.077 Local time zone must be set--see zic manual
page LOG:  database system was not properly shut down; automatic
recovery in progress
2008-06-18 00:05:34.105 Local time zone must be set--see zic manual
page LOG:  redo starts at 70/E600DC68
2008-06-18 00:05:34.106 Local time zone must be set--see zic manual
page LOG:  could not open file "pg_xlog/0000000100000070000000E7" (log
file 112, segment 231): No such file or directory
2008-06-18 00:05:34.106 Local time zone must be set--see zic manual
page LOG:  redo done at 70/E600DC68
2008-06-18 00:05:34.293 Local time zone must be set--see zic manual
page LOG:  recovering prepared transaction 1099169
2008-06-18 00:05:34.293 Local time zone must be set--see zic manual
page LOG:  recovering prepared transaction 1099156
2008-06-18 00:05:34.293 Local time zone must be set--see zic manual
page LOG:  recovering prepared transaction 1099157
2008-06-18 00:05:34.293 Local time zone must be set--see zic manual
page LOG:  recovering prepared transaction 1099161
2008-06-18 00:05:34.293 Local time zone must be set--see zic manual
page LOG:  recovering prepared transaction 1099164
2008-06-18 00:05:34.293 Local time zone must be set--see zic manual
page LOG:  recovering prepared transaction 1099162
2008-06-18 00:05:34.293 Local time zone must be set--see zic manual
page LOG:  recovering prepared transaction 1099166
2008-06-18 00:05:34.294 Local time zone must be set--see zic manual
page LOG:  recovering prepared transaction 1099131
2008-06-18 00:05:34.298 Local time zone must be set--see zic manual
page FATAL:  lock AccessShareLock on object 16477/244169/0 is already
held
2008-06-18 00:05:34.299 Local time zone must be set--see zic manual
page LOG:  startup process (PID 17377) exited with exit code 1
2008-06-18 00:05:34.299 Local time zone must be set--see zic manual
page LOG:  aborting startup due to startup process failure


Is this FATAL error seen on recovery a different bug or is it just a
direct result of bug #3245?

Unfortunately I do not have a way to deterministically reproduce this
problem but I have seen it 3 times so far.

thanks,

John
"John Smith" <sodgodofall@gmail.com> writes:
> 2008-06-17 23:53:53.206 Local time zone must be set--see zic manual
> page PANIC:  failed to re-find shared lock object
> 2008-06-17 23:53:53.207 Local time zone must be set--see zic manual
> page STATEMENT:  commit prepared '148969' ;

> I believe this panic is probably bug #3245 based on the description of
> that bug - http://archives.postgresql.org/pgsql-bugs/2007-04/msg00075.php

Yeah, looks like it to me too.

> At this point I attempted to do a recovery using the continuous
> archive backup on the warm standby system.  Instead of recovering
> correctly it encountered this FATAL error where a AccessSharedLock was
> already held.
> 2008-06-18 00:05:34.298 Local time zone must be set--see zic manual
> page FATAL:  lock AccessShareLock on object 16477/244169/0 is already
> held
> 2008-06-18 00:05:34.299 Local time zone must be set--see zic manual
> page LOG:  startup process (PID 17377) exited with exit code 1
> 2008-06-18 00:05:34.299 Local time zone must be set--see zic manual
> page LOG:  aborting startup due to startup process failure

> Is this FATAL error seen on recovery a different bug or is it just a
> direct result of bug #3245?

It probably is the same bug.  The underlying cause of that bug is
explained here:
http://archives.postgresql.org/pgsql-bugs/2007-04/msg00129.php
I think what you are seeing is just a variant case caused by the same
lock being written out to the twophase file twice.  In any case there's
probably little point in digging further until you've updated to a
version with that fix --- if you still see the problem afterward,
we can look closer.

BTW, what's with the bizarre "Local time zone must be set--see zic
manual" where the timezone should be?  Are you intentionally selecting
the "Factory" zone?

            regards, tom lane
Thanks for the quick reply Tom.  I'll be updating my PG version to one
with a fix for bug #3245 so hopefully we won't see this anymore.

> BTW, what's with the bizarre "Local time zone must be set--see zic
> manual" where the timezone should be?  Are you intentionally selecting
> the "Factory" zone?

I don't think I've put the correct timezone file in /etc/localtime  so
it is using some default file from the Gentoo install.

John



On Mon, Jun 30, 2008 at 12:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "John Smith" <sodgodofall@gmail.com> writes:
>> 2008-06-17 23:53:53.206 Local time zone must be set--see zic manual
>> page PANIC:  failed to re-find shared lock object
>> 2008-06-17 23:53:53.207 Local time zone must be set--see zic manual
>> page STATEMENT:  commit prepared '148969' ;
>
>> I believe this panic is probably bug #3245 based on the description of
>> that bug - http://archives.postgresql.org/pgsql-bugs/2007-04/msg00075.php
>
> Yeah, looks like it to me too.
>
>> At this point I attempted to do a recovery using the continuous
>> archive backup on the warm standby system.  Instead of recovering
>> correctly it encountered this FATAL error where a AccessSharedLock was
>> already held.
>> 2008-06-18 00:05:34.298 Local time zone must be set--see zic manual
>> page FATAL:  lock AccessShareLock on object 16477/244169/0 is already
>> held
>> 2008-06-18 00:05:34.299 Local time zone must be set--see zic manual
>> page LOG:  startup process (PID 17377) exited with exit code 1
>> 2008-06-18 00:05:34.299 Local time zone must be set--see zic manual
>> page LOG:  aborting startup due to startup process failure
>
>> Is this FATAL error seen on recovery a different bug or is it just a
>> direct result of bug #3245?
>
> It probably is the same bug.  The underlying cause of that bug is
> explained here:
> http://archives.postgresql.org/pgsql-bugs/2007-04/msg00129.php
> I think what you are seeing is just a variant case caused by the same
> lock being written out to the twophase file twice.  In any case there's
> probably little point in digging further until you've updated to a
> version with that fix --- if you still see the problem afterward,
> we can look closer.
>
> BTW, what's with the bizarre "Local time zone must be set--see zic
> manual" where the timezone should be?  Are you intentionally selecting
> the "Factory" zone?
>
>                        regards, tom lane
>
"John Smith" <sodgodofall@gmail.com> writes:
>> BTW, what's with the bizarre "Local time zone must be set--see zic
>> manual" where the timezone should be?  Are you intentionally selecting
>> the "Factory" zone?

> I don't think I've put the correct timezone file in /etc/localtime  so
> it is using some default file from the Gentoo install.

Ah, yes, I was able to duplicate that behavior by overwriting
/etc/localtime with /usr/share/zoneinfo/Factory.  I guess the Gentoo
folks failed in their intention to annoy you enough to make you set
the zone correctly ;-)

            regards, tom lane