Thread: Recovery failed on a backup with " lock AccessShareLock on object 16477/244169/0 is already held"
Recovery failed on a backup with " lock AccessShareLock on object 16477/244169/0 is already held"
From
"John Smith"
Date:
Hi, I hit an issue running PG 8.2.3 with the continuous archiving feature where I was unable to recover from the backup. I was wondering if this may be related to bug #3245? These are the steps that occurred before I saw this problem: 1. Prepare transaction. 2. A base backup of the database was taken to a warm standby system. 3. Commit prepared. The commit prepared never finished as it hit a PANIC: 2008-06-17 23:53:53.206 Local time zone must be set--see zic manual page PANIC: failed to re-find shared lock object 2008-06-17 23:53:53.207 Local time zone must be set--see zic manual page STATEMENT: commit prepared '148969' ; I believe this panic is probably bug #3245 based on the description of that bug - http://archives.postgresql.org/pgsql-bugs/2007-04/msg00075.php At this point I attempted to do a recovery using the continuous archive backup on the warm standby system. Instead of recovering correctly it encountered this FATAL error where a AccessSharedLock was already held. 2008-06-18 00:05:34.045 Local time zone must be set--see zic manual page LOG: database system was interrupted at 2008-06-17 23:53:16 Local time zone must be set--see zic manual page 2008-06-18 00:05:34.077 Local time zone must be set--see zic manual page LOG: checkpoint record is at 70/E600DC18 2008-06-18 00:05:34.077 Local time zone must be set--see zic manual page LOG: redo record is at 70/E600DC18; undo record is at 0/0; shutdown FALSE 2008-06-18 00:05:34.077 Local time zone must be set--see zic manual page LOG: next transaction ID: 0/1099178; next OID: 413234 2008-06-18 00:05:34.077 Local time zone must be set--see zic manual page LOG: next MultiXactId: 1; next MultiXactOffset: 0 2008-06-18 00:05:34.077 Local time zone must be set--see zic manual page LOG: database system was not properly shut down; automatic recovery in progress 2008-06-18 00:05:34.105 Local time zone must be set--see zic manual page LOG: redo starts at 70/E600DC68 2008-06-18 00:05:34.106 Local time zone must be set--see zic manual page LOG: could not open file "pg_xlog/0000000100000070000000E7" (log file 112, segment 231): No such file or directory 2008-06-18 00:05:34.106 Local time zone must be set--see zic manual page LOG: redo done at 70/E600DC68 2008-06-18 00:05:34.293 Local time zone must be set--see zic manual page LOG: recovering prepared transaction 1099169 2008-06-18 00:05:34.293 Local time zone must be set--see zic manual page LOG: recovering prepared transaction 1099156 2008-06-18 00:05:34.293 Local time zone must be set--see zic manual page LOG: recovering prepared transaction 1099157 2008-06-18 00:05:34.293 Local time zone must be set--see zic manual page LOG: recovering prepared transaction 1099161 2008-06-18 00:05:34.293 Local time zone must be set--see zic manual page LOG: recovering prepared transaction 1099164 2008-06-18 00:05:34.293 Local time zone must be set--see zic manual page LOG: recovering prepared transaction 1099162 2008-06-18 00:05:34.293 Local time zone must be set--see zic manual page LOG: recovering prepared transaction 1099166 2008-06-18 00:05:34.294 Local time zone must be set--see zic manual page LOG: recovering prepared transaction 1099131 2008-06-18 00:05:34.298 Local time zone must be set--see zic manual page FATAL: lock AccessShareLock on object 16477/244169/0 is already held 2008-06-18 00:05:34.299 Local time zone must be set--see zic manual page LOG: startup process (PID 17377) exited with exit code 1 2008-06-18 00:05:34.299 Local time zone must be set--see zic manual page LOG: aborting startup due to startup process failure Is this FATAL error seen on recovery a different bug or is it just a direct result of bug #3245? Unfortunately I do not have a way to deterministically reproduce this problem but I have seen it 3 times so far. thanks, John
Re: Recovery failed on a backup with " lock AccessShareLock on object 16477/244169/0 is already held"
From
Tom Lane
Date:
"John Smith" <sodgodofall@gmail.com> writes: > 2008-06-17 23:53:53.206 Local time zone must be set--see zic manual > page PANIC: failed to re-find shared lock object > 2008-06-17 23:53:53.207 Local time zone must be set--see zic manual > page STATEMENT: commit prepared '148969' ; > I believe this panic is probably bug #3245 based on the description of > that bug - http://archives.postgresql.org/pgsql-bugs/2007-04/msg00075.php Yeah, looks like it to me too. > At this point I attempted to do a recovery using the continuous > archive backup on the warm standby system. Instead of recovering > correctly it encountered this FATAL error where a AccessSharedLock was > already held. > 2008-06-18 00:05:34.298 Local time zone must be set--see zic manual > page FATAL: lock AccessShareLock on object 16477/244169/0 is already > held > 2008-06-18 00:05:34.299 Local time zone must be set--see zic manual > page LOG: startup process (PID 17377) exited with exit code 1 > 2008-06-18 00:05:34.299 Local time zone must be set--see zic manual > page LOG: aborting startup due to startup process failure > Is this FATAL error seen on recovery a different bug or is it just a > direct result of bug #3245? It probably is the same bug. The underlying cause of that bug is explained here: http://archives.postgresql.org/pgsql-bugs/2007-04/msg00129.php I think what you are seeing is just a variant case caused by the same lock being written out to the twophase file twice. In any case there's probably little point in digging further until you've updated to a version with that fix --- if you still see the problem afterward, we can look closer. BTW, what's with the bizarre "Local time zone must be set--see zic manual" where the timezone should be? Are you intentionally selecting the "Factory" zone? regards, tom lane
Re: Recovery failed on a backup with " lock AccessShareLock on object 16477/244169/0 is already held"
From
"John Smith"
Date:
Thanks for the quick reply Tom. I'll be updating my PG version to one with a fix for bug #3245 so hopefully we won't see this anymore. > BTW, what's with the bizarre "Local time zone must be set--see zic > manual" where the timezone should be? Are you intentionally selecting > the "Factory" zone? I don't think I've put the correct timezone file in /etc/localtime so it is using some default file from the Gentoo install. John On Mon, Jun 30, 2008 at 12:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "John Smith" <sodgodofall@gmail.com> writes: >> 2008-06-17 23:53:53.206 Local time zone must be set--see zic manual >> page PANIC: failed to re-find shared lock object >> 2008-06-17 23:53:53.207 Local time zone must be set--see zic manual >> page STATEMENT: commit prepared '148969' ; > >> I believe this panic is probably bug #3245 based on the description of >> that bug - http://archives.postgresql.org/pgsql-bugs/2007-04/msg00075.php > > Yeah, looks like it to me too. > >> At this point I attempted to do a recovery using the continuous >> archive backup on the warm standby system. Instead of recovering >> correctly it encountered this FATAL error where a AccessSharedLock was >> already held. >> 2008-06-18 00:05:34.298 Local time zone must be set--see zic manual >> page FATAL: lock AccessShareLock on object 16477/244169/0 is already >> held >> 2008-06-18 00:05:34.299 Local time zone must be set--see zic manual >> page LOG: startup process (PID 17377) exited with exit code 1 >> 2008-06-18 00:05:34.299 Local time zone must be set--see zic manual >> page LOG: aborting startup due to startup process failure > >> Is this FATAL error seen on recovery a different bug or is it just a >> direct result of bug #3245? > > It probably is the same bug. The underlying cause of that bug is > explained here: > http://archives.postgresql.org/pgsql-bugs/2007-04/msg00129.php > I think what you are seeing is just a variant case caused by the same > lock being written out to the twophase file twice. In any case there's > probably little point in digging further until you've updated to a > version with that fix --- if you still see the problem afterward, > we can look closer. > > BTW, what's with the bizarre "Local time zone must be set--see zic > manual" where the timezone should be? Are you intentionally selecting > the "Factory" zone? > > regards, tom lane >
Re: Recovery failed on a backup with " lock AccessShareLock on object 16477/244169/0 is already held"
From
Tom Lane
Date:
"John Smith" <sodgodofall@gmail.com> writes: >> BTW, what's with the bizarre "Local time zone must be set--see zic >> manual" where the timezone should be? Are you intentionally selecting >> the "Factory" zone? > I don't think I've put the correct timezone file in /etc/localtime so > it is using some default file from the Gentoo install. Ah, yes, I was able to duplicate that behavior by overwriting /etc/localtime with /usr/share/zoneinfo/Factory. I guess the Gentoo folks failed in their intention to annoy you enough to make you set the zone correctly ;-) regards, tom lane