Thread: Problem with PITR Past Particular WAL File
Greetings:
I am running into a problem during a failover recover of a particular 8.2.4 database running on SunOS 5.10 box. For complete divulgence of information, I am also using the pg_standby utility from the 8.3 contribs to handle the replay of the logs on the standby server.
What I am finding, is that if I only allow it to replay up to a particular WAL file (specifically, 00000001000000180000008A), I am able to trigger the system to change out of recovery mode and it successfully comes online in a bit as expected. I also tried stopping it at each of a few prior WAL files and experienced the same results. Relevant log lines are as follows:
Oct 23 22:40:44 db01b postgres[15894]: [ID 748848 local0.info] [5699-1] LOG: restored log file "000000010000001800000088" from archiveOct 23 22:40:44 db01b postgres[15894]: [ID 748848 local0.info] [5700-1] LOG: restored log file "000000010000001800000089" from archiveOct 23 22:40:44 db01b postgres[15894]: [ID 748848 local0.info] [5701-1] LOG: restored log file "00000001000000180000008A" from archiveOct 23 22:45:50 db01b postgres[15894]: [ID 748848 local0.info] [5702-1] LOG: could not open file "pg_xlog/00000001000000180000008B" (log file 24, segment 139): No such file or directoryOct 23 22:45:50 db01b postgres[15894]: [ID 748848 local0.info] [5703-1] LOG: redo done at 18/8A0C3BC8Oct 23 22:45:50 db01b postgres[15894]: [ID 748848 local0.info] [5704-1] LOG: restored log file "00000001000000180000008A" from archiveOct 23 22:45:50 db01b postgres[15894]: [ID 748848 local0.info] [5705-1] LOG: archive recovery completeOct 23 22:46:16 db01b postgres[15894]: [ID 748848 local0.info] [5706-1] LOG: database system is ready
Now, if I include one more WAL file in the recovery, the additional WAL file appears to be successfully restored, but when triggering the system to come out of recovery mode it fails to fully come online and proceeds to shutdown a few minutes later. I also tried stopping it after each of a few additional WAL files and experienced the same results. Relevant log lines are as follows:
Oct 23 22:20:04 db01b postgres[92]: [ID 748848 local0.info] [5699-1] LOG: restored log file "000000010000001800000088" from archiveOct 23 22:20:04 db01b postgres[92]: [ID 748848 local0.info] [5700-1] LOG: restored log file "000000010000001800000089" from archiveOct 23 22:20:04 db01b postgres[92]: [ID 748848 local0.info] [5701-1] LOG: restored log file "00000001000000180000008A" from archiveOct 23 22:20:04 db01b postgres[92]: [ID 748848 local0.info] [5702-1] LOG: restored log file "00000001000000180000008B" from archiveOct 23 22:22:29 db01b postgres[92]: [ID 748848 local0.info] [5703-1] LOG: could not open file "pg_xlog/00000001000000180000008C" (log file 24, segment 140): No such file or directoryOct 23 22:22:29 db01b postgres[92]: [ID 748848 local0.info] [5704-1] LOG: redo done at 18/8B2174D0Oct 23 22:22:29 db01b postgres[92]: [ID 748848 local0.info] [5705-1] LOG: restored log file "00000001000000180000008B" from archiveOct 23 22:22:29 db01b postgres[92]: [ID 748848 local0.info] [5706-1] LOG: archive recovery completeOct 23 22:27:06 db01b postgres[91]: [ID 748848 local0.info] [1-1] LOG: startup process (PID 92) was terminated by signal 11Oct 23 22:27:06 db01b postgres[91]: [ID 748848 local0.info] [2-1] LOG: aborting startup due to startup process failure
I checked the original server logs around the times that these WAL files were originally archived, but could find no problems being reported. Note that for the sake of absolute consistency, all of my tests were done against a pristine restored base backup.
If any of this doesn't make sense, please let me know and I will do my best to explain myself better. I have been banging my head against this for many hours so it is certainly possible that I may, unbeknownst to myself, be a bit incoherent at this point.
Any suggestions? Thanks.
Cheers,
-craig
---
Craig A. McElroy
Contegix
Beyond Managed Hosting(r) for Your Enterprise
Craig McElroy <craig.mcelroy@contegix.com> writes: > Now, if I include one more WAL file in the recovery, ... >> Oct 23 22:22:29 db01b postgres[92]: [ID 748848 local0.info] >> [5706-1] LOG: archive recovery complete >> Oct 23 22:27:06 db01b postgres[91]: [ID 748848 local0.info] [1-1] >> LOG: startup process (PID 92) was terminated by signal 11 Can you get a stack trace from the core dump reported here? It looks like a bug in post-replay cleanup, but there's not enough detail to say more. regards, tom lane
Tom: Certainly, how can that be obtained? Cheers, -craig --- Craig McElroy Contegix Beyond Managed Hosting(r) for Your Enterprise On Oct 24, 2007, at 7:12 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Craig McElroy <craig.mcelroy@contegix.com> writes: >> Now, if I include one more WAL file in the recovery, ... > >>> Oct 23 22:22:29 db01b postgres[92]: [ID 748848 local0.info] >>> [5706-1] LOG: archive recovery complete >>> Oct 23 22:27:06 db01b postgres[91]: [ID 748848 local0.info] [1-1] >>> LOG: startup process (PID 92) was terminated by signal 11 > > Can you get a stack trace from the core dump reported here? > It looks like a bug in post-replay cleanup, but there's not > enough detail to say more. > > regards, tom lane > > ---------------------------(end of > broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly
Craig McElroy <craig.mcelroy@contegix.com> writes: >> Can you get a stack trace from the core dump reported here? > Certainly, how can that be obtained? $ gdb /path/to/postgres-executable /path/to/core-file gdb> bt gdb> quit If you don't find a corefile in $PGDATA (or wherever your system puts core files) then you probably need to restart the postmaster with "ulimit -c unlimited" to allow producing a core. If the "bt" output is just numbers and no symbols then it won't be of any use; in that case you'll need to find or build non-stripped Postgres executables. regards, tom lane
Tom:
Thanks for your help on this. The trace details are as follows:
Core was generated by `/usr/local/pgsql/bin/postgres -D /pgdata01/data'.
Program terminated with signal 11, Segmentation fault.
#0 0x080b8ee0 in entrySplitPage ()
#1 0x080baccf in ginInsertValue ()
#2 0x080b81b7 in gin_xlog_cleanup ()
#3 0x080af4ce in StartupXLOG ()
#4 0x080c04ca in BootstrapMain ()
#5 0x08186b2f in StartChildProcess ()
#6 0x081889eb in PostmasterMain ()
#7 0x0814ee9e in main ()
Cheers,
-craig
---
Craig A. McElroy
Contegix
Beyond Managed Hosting(r) for Your Enterprise
On Oct 24, 2007, at 1:26 PM, Tom Lane wrote:
Craig McElroy <craig.mcelroy@contegix.com> writes:Can you get a stack trace from the core dump reported here?Certainly, how can that be obtained?$ gdb /path/to/postgres-executable /path/to/core-filegdb> btgdb> quitIf you don't find a corefile in $PGDATA (or wherever your system putscore files) then you probably need to restart the postmaster with"ulimit -c unlimited" to allow producing a core.If the "bt" output is just numbers and no symbols then it won't be ofany use; in that case you'll need to find or build non-stripped Postgresexecutables.regards, tom lane---------------------------(end of broadcast)---------------------------TIP 1: if posting/reading through Usenet, please send an appropriatesubscribe-nomail command to majordomo@postgresql.org so that yourmessage can get through to the mailing list cleanly
Craig McElroy wrote: > Tom: > Thanks for your help on this. The trace details are as follows: > > Core was generated by `/usr/local/pgsql/bin/postgres -D /pgdata01/data'. > Program terminated with signal 11, Segmentation fault. > > #0 0x080b8ee0 in entrySplitPage () > #1 0x080baccf in ginInsertValue () > #2 0x080b81b7 in gin_xlog_cleanup () So it looks like a bug in the gin xlog code, which makes sense because it's new. Can you please execute in that gdb session the command "bt full" and send the output? -- Alvaro Herrera http://www.advogato.org/person/alvherre "La realidad se compone de muchos sueños, todos ellos diferentes, pero en cierto aspecto, parecidos..." (Yo, hablando de sueños eróticos)
Alvaro:
I just tried, but I only get "No symbol table info available." after each line of the trace.
Cheers,
-craig
---
Craig A. McElroy
Contegix
Beyond Managed Hosting(r) for Your Enterprise
On Oct 25, 2007, at 7:57 AM, Alvaro Herrera wrote:
Craig McElroy wrote:Tom:Thanks for your help on this. The trace details are as follows:Core was generated by `/usr/local/pgsql/bin/postgres -D /pgdata01/data'.Program terminated with signal 11, Segmentation fault.#0 0x080b8ee0 in entrySplitPage ()#1 0x080baccf in ginInsertValue ()#2 0x080b81b7 in gin_xlog_cleanup ()So it looks like a bug in the gin xlog code, which makes sense becauseit's new.Can you please execute in that gdb session the command "bt full" andsend the output?--Alvaro Herrera http://www.advogato.org/person/alvherre"La realidad se compone de muchos sueños, todos ellos diferentes,pero en cierto aspecto, parecidos..." (Yo, hablando de sueños eróticos)
Craig McElroy wrote: > Alvaro: > I just tried, but I only get "No symbol table info available." after each > line of the trace. Well, we have two possibilities: 1. you send us a complete, reproducible test case, 2. you recompile with debugging enabled (configure --enable-debug) to get a more useful backtrace -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Craig McElroy <craig.mcelroy@contegix.com> writes: > Core was generated by `/usr/local/pgsql/bin/postgres -D /pgdata01/data'. > Program terminated with signal 11, Segmentation fault. > #0 0x080b8ee0 in entrySplitPage () > #1 0x080baccf in ginInsertValue () > #2 0x080b81b7 in gin_xlog_cleanup () > #3 0x080af4ce in StartupXLOG () > #4 0x080c04ca in BootstrapMain () > #5 0x08186b2f in StartChildProcess () > #6 0x081889eb in PostmasterMain () > #7 0x0814ee9e in main () Hm, I wonder if this is explained by a bug already fixed in 8.2.5: 2007-06-04 11:59 teodor * src/backend/access/gin/: gindatapage.c, ginentrypage.c, ginget.c, ginvacuum.c, ginxlog.c (REL8_2_STABLE): Fix bundle bugs of GIN: - Fix possible deadlock between UPDATE and VACUUM queries. Bug never was observed in 8.2, but it still exist there. HEAD is more sensitive to bug after recent "ring" of buffer improvements. - Fix WAL creation: if parent page is stored as is after split then incomplete split isn't removed during replay. This happens rather rare, only on large tables with a lot of updates/inserts. - Fix WAL replay: there was wrong test of XLR_BKP_BLOCK_* for left page after deletion of page. That causes wrong rightlink field: it pointed to deleted page. - add checking of match of clearing incomplete split - cleanup incomplete split list after proceeding All of this chages doesn't change on-disk storage, so backpatch... But second point may be an issue for replaying logs from previous version. Teodor, can you comment on whether this stack trace looks like it could be related to that fix? Craig, can you retry your test scenario on 8.2.5? regards, tom lane
>> #0 0x080b8ee0 in entrySplitPage () >> #1 0x080baccf in ginInsertValue () Corrupted stack? gin_xlog_cleanup() doesn't call ginInsertValue() directly. >> #2 0x080b81b7 in gin_xlog_cleanup () >> #3 0x080af4ce in StartupXLOG () >> #4 0x080c04ca in BootstrapMain () >> #5 0x08186b2f in StartChildProcess () >> #6 0x081889eb in PostmasterMain () >> #7 0x0814ee9e in main () > Teodor, can you comment on whether this stack trace looks like it could > be related to that fix? No, I suppose. That changes was about deadlock prevention, recognize splits and fixes in deletion of page in a posting tree. entrySplitPage() is a split of page of indexed values. Hmm. I suppose, I found a reason. Real backtrace should look like: gin_xlog_cleanup() ginContinueSplit() prepareEntryScan(,,,NULL) // makes btree.ginstate = NULL ginInsertValue() entrySplitPage() // tries to access // btree->ginstate->tupdesc in // 497 line That piece of code: value = index_getattr(leftrightmost, FirstOffsetNumber, btree->ginstate->tupdesc, &isnull); btree->entry = GinFormTuple(btree->ginstate, value, NULL, 0); ItemPointerSet(&(btree->entry)->t_tid, BufferGetBlockNumber(lbuf), InvalidOffsetNumber); It just makes new tuple with the same value and another ItemPointer. Some later I'll make a patch which will not call index_getattr() and GinFormTuple() - now I know how make a test suite. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Teodor:
Thanks for the followup. We will be upgrading this system to 8.2.5 in the next day or so and will test it on that anyways just to be sure it isn't related. Let me know if there is anything else we can provide to help track down the bug. We will be keeping this base backup and WAL files around so that we can use them for testing.
Cheers,
-craig
---
Craig A. McElroy
Contegix
Beyond Managed Hosting(r) for Your Enterprise
On Oct 25, 2007, at 1:05 PM, Teodor Sigaev wrote:
#0 0x080b8ee0 in entrySplitPage ()#1 0x080baccf in ginInsertValue ()Corrupted stack? gin_xlog_cleanup() doesn't call ginInsertValue() directly.#2 0x080b81b7 in gin_xlog_cleanup ()#3 0x080af4ce in StartupXLOG ()#4 0x080c04ca in BootstrapMain ()#5 0x08186b2f in StartChildProcess ()#6 0x081889eb in PostmasterMain ()#7 0x0814ee9e in main ()Teodor, can you comment on whether this stack trace looks like it couldbe related to that fix?No, I suppose. That changes was about deadlock prevention, recognize splits andfixes in deletion of page in a posting tree. entrySplitPage() is a split of page of indexed values.Hmm. I suppose, I found a reason. Real backtrace should look like:gin_xlog_cleanup()ginContinueSplit()prepareEntryScan(,,,NULL) // makes btree.ginstate = NULLginInsertValue()entrySplitPage() // tries to access// btree->ginstate->tupdesc in// 497 lineThat piece of code:value = index_getattr(leftrightmost, FirstOffsetNumber, btree->ginstate->tupdesc, &isnull);btree->entry = GinFormTuple(btree->ginstate, value, NULL, 0);ItemPointerSet(&(btree->entry)->t_tid, BufferGetBlockNumber(lbuf), InvalidOffsetNumber);It just makes new tuple with the same value and another ItemPointer. Some later I'll make a patch which will not call index_getattr() and GinFormTuple() - now I know how make a test suite.--Teodor Sigaev E-mail: teodor@sigaev.ruWWW: http://www.sigaev.ru/---------------------------(end of broadcast)---------------------------TIP 6: explain analyze is your friend
Craig McElroy wrote: > Teodor: > Thanks for the followup. We will be upgrading this system to 8.2.5 in > the next day or so and will test it on that anyways just to be sure it > isn't related. Let me know if there is anything else we can provide to > help track down the bug. We will be keeping this base backup and WAL > files around so that we can use them for testing. If I'm right then 8.2.5 will not help you. Attached patch should fix your problem, but I suppose that there is an another bug. Test specially developed for reproduce your problem discovers another issue. The diagnostic is: FATAL: failed to add item to index page in 1663/10819/16460 CONTEXT: xlog redo Insert item, node: 1663/10819/16460 blkno: 514 offset: 78 nitem: 1 isdata: F isleaf F isdelete F updateBlkno:1551 Now I'm working on fixing it. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/ *** ./src/backend/access/gin.orig/ginentrypage.c Thu Oct 25 22:11:56 2007 --- ./src/backend/access/gin/ginentrypage.c Thu Oct 25 22:43:17 2007 *************** *** 404,409 **** --- 404,434 ---- } /* + * Returns new tuple with copied value from source tuple. + * New tuple will not store posting list + */ + static IndexTuple + copyIndexTuple(IndexTuple itup, Page page) + { + IndexTuple nitup; + + if (GinPageIsLeaf(page) && !GinIsPostingTree(itup)) + { + nitup = (IndexTuple) palloc(MAXALIGN(GinGetOrigSizePosting(itup))); + memcpy(nitup, itup, GinGetOrigSizePosting(itup)); + nitup->t_info &= ~INDEX_SIZE_MASK; + nitup->t_info |= GinGetOrigSizePosting(itup); + } + else + { + nitup = (IndexTuple) palloc(MAXALIGN(IndexTupleSize(itup))); + memcpy(nitup, itup, IndexTupleSize(itup)); + } + + return nitup; + } + + /* * Place tuple and split page, original buffer(lbuf) leaves untouched, * returns shadow page of lbuf filled new data. * Tuples are distributed between pages by equal size on its, not *************** *** 424,431 **** IndexTuple itup, leftrightmost = NULL; static ginxlogSplit data; - Datum value; - bool isnull; Page page; Page lpage = GinPageGetCopyPage(BufferGetPage(lbuf)); Page rpage = BufferGetPage(rbuf); --- 449,454 ---- *************** *** 494,502 **** ptr += MAXALIGN(IndexTupleSize(itup)); } ! value = index_getattr(leftrightmost, FirstOffsetNumber, btree->ginstate->tupdesc, &isnull); ! btree->entry = GinFormTuple(btree->ginstate, value, NULL, 0); ItemPointerSet(&(btree->entry)->t_tid, BufferGetBlockNumber(lbuf), InvalidOffsetNumber); btree->rightblkno = BufferGetBlockNumber(rbuf); data.node = btree->index->rd_node; --- 517,525 ---- ptr += MAXALIGN(IndexTupleSize(itup)); } ! btree->entry = copyIndexTuple(leftrightmost, lpage); ItemPointerSet(&(btree->entry)->t_tid, BufferGetBlockNumber(lbuf), InvalidOffsetNumber); + btree->rightblkno = BufferGetBlockNumber(rbuf); data.node = btree->index->rd_node; *************** *** 533,552 **** Page page = BufferGetPage(buf); itup = getRightMostTuple(page); ! if (GinPageIsLeaf(page) && !GinIsPostingTree(itup)) ! { ! nitup = (IndexTuple) palloc(MAXALIGN(GinGetOrigSizePosting(itup))); ! memcpy(nitup, itup, GinGetOrigSizePosting(itup)); ! nitup->t_info &= ~INDEX_SIZE_MASK; ! nitup->t_info |= GinGetOrigSizePosting(itup); ! } ! else ! { ! nitup = (IndexTuple) palloc(MAXALIGN(IndexTupleSize(itup))); ! memcpy(nitup, itup, IndexTupleSize(itup)); ! } ! ItemPointerSet(&nitup->t_tid, BufferGetBlockNumber(buf), InvalidOffsetNumber); return nitup; } --- 556,564 ---- Page page = BufferGetPage(buf); itup = getRightMostTuple(page); ! nitup = copyIndexTuple(itup, page); ItemPointerSet(&nitup->t_tid, BufferGetBlockNumber(buf), InvalidOffsetNumber); + return nitup; }
Thank you a lot for your report. Bug fixed and fixes are committed in CVS. > Thanks for the followup. We will be upgrading this system to 8.2.5 in > the next day or so and will test it on that anyways just to be sure it > isn't related. Let me know if there is anything else we can provide to > help track down the bug. We will be keeping this base backup and WAL > files around so that we can use them for testing. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Teodor: That's great news. Thanks for the followup. Cheers, -craig --- Craig A. McElroy Contegix Beyond Managed Hosting(r) for Your Enterprise On Oct 29, 2007, at 2:29 PM, Teodor Sigaev wrote: > Thank you a lot for your report. Bug fixed and fixes are committed > in CVS. > >> Thanks for the followup. We will be upgrading this system to >> 8.2.5 in the next day or so and will test it on that anyways just >> to be sure it isn't related. Let me know if there is anything else >> we can provide to help track down the bug. We will be keeping this >> base backup and WAL files around so that we can use them for testing. > -- > Teodor Sigaev E-mail: teodor@sigaev.ru > WWW: http://www.sigaev.ru/ > > ---------------------------(end of > broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq