Thread: Problem with PITR Past Particular WAL File

Problem with PITR Past Particular WAL File

From

Craig McElroy

Date:

24 October 2007, 07:20:35

Greetings:

I am running into a problem during a failover recover of a particular 8.2.4 database running on SunOS 5.10 box. For complete divulgence of information, I am also using the pg_standby utility from the 8.3 contribs to handle the replay of the logs on the standby server.

What I am finding, is that if I only allow it to replay up to a particular WAL file (specifically, 00000001000000180000008A), I am able to trigger the system to change out of recovery mode and it successfully comes online in a bit as expected. I also tried stopping it at each of a few prior WAL files and experienced the same results. Relevant log lines are as follows:

Oct 23 22:40:44 db01b postgres[15894]: [ID 748848 local0.info] [5699-1] LOG: restored log file "000000010000001800000088" from archive
Oct 23 22:40:44 db01b postgres[15894]: [ID 748848 local0.info] [5700-1] LOG: restored log file "000000010000001800000089" from archive
Oct 23 22:40:44 db01b postgres[15894]: [ID 748848 local0.info] [5701-1] LOG: restored log file "00000001000000180000008A" from archive
Oct 23 22:45:50 db01b postgres[15894]: [ID 748848 local0.info] [5702-1] LOG: could not open file "pg_xlog/00000001000000180000008B" (log file 24, segment 139): No such file or directory
Oct 23 22:45:50 db01b postgres[15894]: [ID 748848 local0.info] [5703-1] LOG: redo done at 18/8A0C3BC8
Oct 23 22:45:50 db01b postgres[15894]: [ID 748848 local0.info] [5704-1] LOG: restored log file "00000001000000180000008A" from archive
Oct 23 22:45:50 db01b postgres[15894]: [ID 748848 local0.info] [5705-1] LOG: archive recovery complete
Oct 23 22:46:16 db01b postgres[15894]: [ID 748848 local0.info] [5706-1] LOG: database system is ready

Now, if I include one more WAL file in the recovery, the additional WAL file appears to be successfully restored, but when triggering the system to come out of recovery mode it fails to fully come online and proceeds to shutdown a few minutes later. I also tried stopping it after each of a few additional WAL files and experienced the same results. Relevant log lines are as follows:

Oct 23 22:20:04 db01b postgres[92]: [ID 748848 local0.info] [5699-1] LOG: restored log file "000000010000001800000088" from archive
Oct 23 22:20:04 db01b postgres[92]: [ID 748848 local0.info] [5700-1] LOG: restored log file "000000010000001800000089" from archive
Oct 23 22:20:04 db01b postgres[92]: [ID 748848 local0.info] [5701-1] LOG: restored log file "00000001000000180000008A" from archive
Oct 23 22:20:04 db01b postgres[92]: [ID 748848 local0.info] [5702-1] LOG: restored log file "00000001000000180000008B" from archive
Oct 23 22:22:29 db01b postgres[92]: [ID 748848 local0.info] [5703-1] LOG: could not open file "pg_xlog/00000001000000180000008C" (log file 24, segment 140): No such file or directory
Oct 23 22:22:29 db01b postgres[92]: [ID 748848 local0.info] [5704-1] LOG: redo done at 18/8B2174D0
Oct 23 22:22:29 db01b postgres[92]: [ID 748848 local0.info] [5705-1] LOG: restored log file "00000001000000180000008B" from archive
Oct 23 22:22:29 db01b postgres[92]: [ID 748848 local0.info] [5706-1] LOG: archive recovery complete
Oct 23 22:27:06 db01b postgres[91]: [ID 748848 local0.info] [1-1] LOG: startup process (PID 92) was terminated by signal 11
Oct 23 22:27:06 db01b postgres[91]: [ID 748848 local0.info] [2-1] LOG: aborting startup due to startup process failure

I checked the original server logs around the times that these WAL files were originally archived, but could find no problems being reported. Note that for the sake of absolute consistency, all of my tests were done against a pristine restored base backup.

If any of this doesn't make sense, please let me know and I will do my best to explain myself better. I have been banging my head against this for many hours so it is certainly possible that I may, unbeknownst to myself, be a bit incoherent at this point.

Any suggestions? Thanks.

Cheers,

-craig

---

Craig A. McElroy

Contegix

Beyond Managed Hosting(r) for Your Enterprise

Re: Problem with PITR Past Particular WAL File

From

Tom Lane

Date:

24 October 2007, 12:13:19

Craig McElroy <craig.mcelroy@contegix.com> writes:
>    Now, if I include one more WAL file in the recovery, ...

>> Oct 23 22:22:29 db01b postgres[92]: [ID 748848 local0.info]
>> [5706-1] LOG:  archive recovery complete
>> Oct 23 22:27:06 db01b postgres[91]: [ID 748848 local0.info] [1-1]
>> LOG:  startup process (PID 92) was terminated by signal 11

Can you get a stack trace from the core dump reported here?
It looks like a bug in post-replay cleanup, but there's not
enough detail to say more.

            regards, tom lane

Re: Problem with PITR Past Particular WAL File

From

Craig McElroy

Date:

24 October 2007, 16:46:56

Tom:
  Certainly, how can that be obtained?

Cheers,
-craig

---
Craig McElroy
Contegix
Beyond Managed Hosting(r) for Your Enterprise

On Oct 24, 2007, at 7:12 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Craig McElroy <craig.mcelroy@contegix.com> writes:
>>  Now, if I include one more WAL file in the recovery, ...
>
>>> Oct 23 22:22:29 db01b postgres[92]: [ID 748848 local0.info]
>>> [5706-1] LOG:  archive recovery complete
>>> Oct 23 22:27:06 db01b postgres[91]: [ID 748848 local0.info] [1-1]
>>> LOG:  startup process (PID 92) was terminated by signal 11
>
> Can you get a stack trace from the core dump reported here?
> It looks like a bug in post-replay cleanup, but there's not
> enough detail to say more.
>
>           regards, tom lane
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>      subscribe-nomail command to majordomo@postgresql.org so that your
>      message can get through to the mailing list cleanly

Re: Problem with PITR Past Particular WAL File

From

Tom Lane

Date:

24 October 2007, 18:27:08

Craig McElroy <craig.mcelroy@contegix.com> writes:
>> Can you get a stack trace from the core dump reported here?

>   Certainly, how can that be obtained?

$ gdb /path/to/postgres-executable /path/to/core-file
gdb> bt
gdb> quit

If you don't find a corefile in $PGDATA (or wherever your system puts
core files) then you probably need to restart the postmaster with
"ulimit -c unlimited" to allow producing a core.

If the "bt" output is just numbers and no symbols then it won't be of
any use; in that case you'll need to find or build non-stripped Postgres
executables.

            regards, tom lane

Re: Problem with PITR Past Particular WAL File

From

Craig McElroy

Date:

25 October 2007, 12:37:19

Tom:

Thanks for your help on this. The trace details are as follows:

Core was generated by `/usr/local/pgsql/bin/postgres -D /pgdata01/data'.

Program terminated with signal 11, Segmentation fault.

#0 0x080b8ee0 in entrySplitPage ()

#1 0x080baccf in ginInsertValue ()

#2 0x080b81b7 in gin_xlog_cleanup ()

#3 0x080af4ce in StartupXLOG ()

#4 0x080c04ca in BootstrapMain ()

#5 0x08186b2f in StartChildProcess ()

#6 0x081889eb in PostmasterMain ()

#7 0x0814ee9e in main ()

Cheers,

-craig

---

Craig A. McElroy

Contegix

Beyond Managed Hosting(r) for Your Enterprise

On Oct 24, 2007, at 1:26 PM, Tom Lane wrote:

Craig McElroy <craig.mcelroy@contegix.com> writes:
Can you get a stack trace from the core dump reported here?

Certainly, how can that be obtained?

$ gdb /path/to/postgres-executable /path/to/core-file
gdb> bt
gdb> quit

If you don't find a corefile in $PGDATA (or wherever your system puts
core files) then you probably need to restart the postmaster with
"ulimit -c unlimited" to allow producing a core.

If the "bt" output is just numbers and no symbols then it won't be of
any use; in that case you'll need to find or build non-stripped Postgres
executables.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Re: Problem with PITR Past Particular WAL File

From

Alvaro Herrera

Date:

25 October 2007, 12:59:04

Craig McElroy wrote:
> Tom:
>   Thanks for your help on this.  The trace details are as follows:
>
> Core was generated by `/usr/local/pgsql/bin/postgres -D /pgdata01/data'.
> Program terminated with signal 11, Segmentation fault.
>
> #0  0x080b8ee0 in entrySplitPage ()
> #1  0x080baccf in ginInsertValue ()
> #2  0x080b81b7 in gin_xlog_cleanup ()

So it looks like a bug in the gin xlog code, which makes sense because
it's new.

Can you please execute in that gdb session the command "bt full" and
send the output?

--
Alvaro Herrera                        http://www.advogato.org/person/alvherre
"La realidad se compone de muchos sueños, todos ellos diferentes,
pero en cierto aspecto, parecidos..." (Yo, hablando de sueños eróticos)

Re: Problem with PITR Past Particular WAL File

From

Craig McElroy

Date:

25 October 2007, 14:48:57

Alvaro:

I just tried, but I only get "No symbol table info available." after each line of the trace.

Cheers,

-craig

---

Craig A. McElroy

Contegix

Beyond Managed Hosting(r) for Your Enterprise

On Oct 25, 2007, at 7:57 AM, Alvaro Herrera wrote:

Craig McElroy wrote:
Tom:
Thanks for your help on this. The trace details are as follows:

Core was generated by `/usr/local/pgsql/bin/postgres -D /pgdata01/data'.
Program terminated with signal 11, Segmentation fault.

#0 0x080b8ee0 in entrySplitPage ()
#1 0x080baccf in ginInsertValue ()
#2 0x080b81b7 in gin_xlog_cleanup ()

So it looks like a bug in the gin xlog code, which makes sense because
it's new.

Can you please execute in that gdb session the command "bt full" and
send the output?

--
Alvaro Herrera http://www.advogato.org/person/alvherre
"La realidad se compone de muchos sueños, todos ellos diferentes,
pero en cierto aspecto, parecidos..." (Yo, hablando de sueños eróticos)

Re: Problem with PITR Past Particular WAL File

From

Alvaro Herrera

Date:

25 October 2007, 14:56:59

Craig McElroy wrote:
> Alvaro:
>   I just tried, but I only get "No symbol table info available." after each
> line of the trace.

Well, we have two possibilities:

1. you send us a complete, reproducible test case,

2. you recompile with debugging enabled (configure --enable-debug) to
get a more useful backtrace

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Problem with PITR Past Particular WAL File

From

Tom Lane

Date:

25 October 2007, 16:06:29

Craig McElroy <craig.mcelroy@contegix.com> writes:
> Core was generated by `/usr/local/pgsql/bin/postgres -D /pgdata01/data'.
> Program terminated with signal 11, Segmentation fault.

> #0  0x080b8ee0 in entrySplitPage ()
> #1  0x080baccf in ginInsertValue ()
> #2  0x080b81b7 in gin_xlog_cleanup ()
> #3  0x080af4ce in StartupXLOG ()
> #4  0x080c04ca in BootstrapMain ()
> #5  0x08186b2f in StartChildProcess ()
> #6  0x081889eb in PostmasterMain ()
> #7  0x0814ee9e in main ()

Hm, I wonder if this is explained by a bug already fixed in 8.2.5:

2007-06-04 11:59  teodor

    * src/backend/access/gin/: gindatapage.c, ginentrypage.c, ginget.c,
    ginvacuum.c, ginxlog.c (REL8_2_STABLE): Fix bundle bugs of GIN: -
    Fix possible deadlock between UPDATE and VACUUM queries. Bug never
    was   observed in 8.2, but it still exist there. HEAD is more
    sensitive to   bug after recent "ring" of buffer improvements.    -
    Fix WAL creation: if parent page is stored as is after split then
    incomplete split isn't removed during replay. This happens rather
    rare, only   on large tables with a lot of updates/inserts.  - Fix
    WAL replay: there was wrong test of XLR_BKP_BLOCK_* for left   page
    after deletion of page. That causes wrong rightlink field: it
    pointed   to deleted page.  - add checking of match of clearing
    incomplete split - cleanup incomplete split list after proceeding

    All of this chages doesn't change on-disk storage, so backpatch...
    But second point may be an issue for replaying logs from previous
    version.

Teodor, can you comment on whether this stack trace looks like it could
be related to that fix?

Craig, can you retry your test scenario on 8.2.5?

            regards, tom lane

Re: Problem with PITR Past Particular WAL File

From

Teodor Sigaev

Date:

25 October 2007, 18:05:59

>> #0  0x080b8ee0 in entrySplitPage ()
>> #1  0x080baccf in ginInsertValue ()
Corrupted stack? gin_xlog_cleanup() doesn't call ginInsertValue() directly.

>> #2  0x080b81b7 in gin_xlog_cleanup ()
>> #3  0x080af4ce in StartupXLOG ()
>> #4  0x080c04ca in BootstrapMain ()
>> #5  0x08186b2f in StartChildProcess ()
>> #6  0x081889eb in PostmasterMain ()
>> #7  0x0814ee9e in main ()

> Teodor, can you comment on whether this stack trace looks like it could
> be related to that fix?
No, I suppose. That changes was about deadlock prevention, recognize splits and
fixes in deletion of page in a posting tree. entrySplitPage() is a split of page
of indexed values.

Hmm. I suppose, I found a reason. Real backtrace should look like:
  gin_xlog_cleanup()
    ginContinueSplit()
        prepareEntryScan(,,,NULL) // makes btree.ginstate = NULL
        ginInsertValue()
            entrySplitPage() // tries to access
                     // btree->ginstate->tupdesc in
                     // 497 line


That piece of code:
     value = index_getattr(leftrightmost, FirstOffsetNumber,
btree->ginstate->tupdesc, &isnull);
     btree->entry = GinFormTuple(btree->ginstate, value, NULL, 0);
     ItemPointerSet(&(btree->entry)->t_tid, BufferGetBlockNumber(lbuf),
InvalidOffsetNumber);

It just makes new tuple with the same value and another ItemPointer. Some later
I'll make a patch which will not call index_getattr() and GinFormTuple() - now I
know how make a test suite.

--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Re: Problem with PITR Past Particular WAL File

From

Craig McElroy

Date:

25 October 2007, 19:28:33

Teodor:

Thanks for the followup. We will be upgrading this system to 8.2.5 in the next day or so and will test it on that anyways just to be sure it isn't related. Let me know if there is anything else we can provide to help track down the bug. We will be keeping this base backup and WAL files around so that we can use them for testing.

Cheers,

-craig

---

Craig A. McElroy

Contegix

Beyond Managed Hosting(r) for Your Enterprise

On Oct 25, 2007, at 1:05 PM, Teodor Sigaev wrote:

#0 0x080b8ee0 in entrySplitPage ()
#1 0x080baccf in ginInsertValue ()
Corrupted stack? gin_xlog_cleanup() doesn't call ginInsertValue() directly.

#2 0x080b81b7 in gin_xlog_cleanup ()
#3 0x080af4ce in StartupXLOG ()
#4 0x080c04ca in BootstrapMain ()
#5 0x08186b2f in StartChildProcess ()
#6 0x081889eb in PostmasterMain ()
#7 0x0814ee9e in main ()

Teodor, can you comment on whether this stack trace looks like it could
be related to that fix?
No, I suppose. That changes was about deadlock prevention, recognize splits and
fixes in deletion of page in a posting tree. entrySplitPage() is a split of page of indexed values.

Hmm. I suppose, I found a reason. Real backtrace should look like:
gin_xlog_cleanup()
ginContinueSplit()
prepareEntryScan(,,,NULL) // makes btree.ginstate = NULL
ginInsertValue()
entrySplitPage() // tries to access
// btree->ginstate->tupdesc in
// 497 line

That piece of code:
value = index_getattr(leftrightmost, FirstOffsetNumber, btree->ginstate->tupdesc, &isnull);
btree->entry = GinFormTuple(btree->ginstate, value, NULL, 0);
ItemPointerSet(&(btree->entry)->t_tid, BufferGetBlockNumber(lbuf), InvalidOffsetNumber);

It just makes new tuple with the same value and another ItemPointer. Some later I'll make a patch which will not call index_getattr() and GinFormTuple() - now I know how make a test suite.

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Re: Problem with PITR Past Particular WAL File

From

Teodor Sigaev

Date:

26 October 2007, 12:53:29


Craig McElroy wrote:
> Teodor:
>   Thanks for the followup.  We will be upgrading this system to 8.2.5 in
> the next day or so and will test it on that anyways just to be sure it
> isn't related.  Let me know if there is anything else we can provide to
> help track down the bug.  We will be keeping this base backup and WAL
> files around so that we can use them for testing.
If I'm right then 8.2.5 will not help you. Attached patch should fix your
problem, but I suppose that there is an another bug. Test specially developed
for reproduce your problem discovers another issue. The diagnostic is:
FATAL:  failed to add item to index page in 1663/10819/16460
CONTEXT:  xlog redo Insert item, node: 1663/10819/16460 blkno: 514 offset: 78
nitem: 1 isdata: F isleaf F isdelete F updateBlkno:1551

Now I'm working on fixing it.
--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/
*** ./src/backend/access/gin.orig/ginentrypage.c    Thu Oct 25 22:11:56 2007
--- ./src/backend/access/gin/ginentrypage.c    Thu Oct 25 22:43:17 2007
***************
*** 404,409 ****
--- 404,434 ----
  }

  /*
+  * Returns new tuple with copied value from source tuple.
+  * New tuple will not store posting list
+  */
+ static IndexTuple
+ copyIndexTuple(IndexTuple itup, Page page)
+ {
+     IndexTuple    nitup;
+
+     if (GinPageIsLeaf(page) && !GinIsPostingTree(itup))
+     {
+         nitup = (IndexTuple) palloc(MAXALIGN(GinGetOrigSizePosting(itup)));
+         memcpy(nitup, itup, GinGetOrigSizePosting(itup));
+         nitup->t_info &= ~INDEX_SIZE_MASK;
+         nitup->t_info |= GinGetOrigSizePosting(itup);
+     }
+     else
+     {
+         nitup = (IndexTuple) palloc(MAXALIGN(IndexTupleSize(itup)));
+         memcpy(nitup, itup, IndexTupleSize(itup));
+     }
+
+     return nitup;
+ }
+
+ /*
   * Place tuple and split page, original buffer(lbuf) leaves untouched,
   * returns shadow page of lbuf filled new data.
   * Tuples are distributed between pages by equal size on its, not
***************
*** 424,431 ****
      IndexTuple    itup,
                  leftrightmost = NULL;
      static ginxlogSplit data;
-     Datum        value;
-     bool        isnull;
      Page        page;
      Page        lpage = GinPageGetCopyPage(BufferGetPage(lbuf));
      Page        rpage = BufferGetPage(rbuf);
--- 449,454 ----
***************
*** 494,502 ****
          ptr += MAXALIGN(IndexTupleSize(itup));
      }

!     value = index_getattr(leftrightmost, FirstOffsetNumber, btree->ginstate->tupdesc, &isnull);
!     btree->entry = GinFormTuple(btree->ginstate, value, NULL, 0);
      ItemPointerSet(&(btree->entry)->t_tid, BufferGetBlockNumber(lbuf), InvalidOffsetNumber);
      btree->rightblkno = BufferGetBlockNumber(rbuf);

      data.node = btree->index->rd_node;
--- 517,525 ----
          ptr += MAXALIGN(IndexTupleSize(itup));
      }

!     btree->entry = copyIndexTuple(leftrightmost, lpage);
      ItemPointerSet(&(btree->entry)->t_tid, BufferGetBlockNumber(lbuf), InvalidOffsetNumber);
+
      btree->rightblkno = BufferGetBlockNumber(rbuf);

      data.node = btree->index->rd_node;
***************
*** 533,552 ****
      Page        page = BufferGetPage(buf);

      itup = getRightMostTuple(page);
!     if (GinPageIsLeaf(page) && !GinIsPostingTree(itup))
!     {
!         nitup = (IndexTuple) palloc(MAXALIGN(GinGetOrigSizePosting(itup)));
!         memcpy(nitup, itup, GinGetOrigSizePosting(itup));
!         nitup->t_info &= ~INDEX_SIZE_MASK;
!         nitup->t_info |= GinGetOrigSizePosting(itup);
!     }
!     else
!     {
!         nitup = (IndexTuple) palloc(MAXALIGN(IndexTupleSize(itup)));
!         memcpy(nitup, itup, IndexTupleSize(itup));
!     }
!
      ItemPointerSet(&nitup->t_tid, BufferGetBlockNumber(buf), InvalidOffsetNumber);
      return nitup;
  }

--- 556,564 ----
      Page        page = BufferGetPage(buf);

      itup = getRightMostTuple(page);
!     nitup = copyIndexTuple(itup, page);
      ItemPointerSet(&nitup->t_tid, BufferGetBlockNumber(buf), InvalidOffsetNumber);
+
      return nitup;
  }

Re: Problem with PITR Past Particular WAL File

From

Teodor Sigaev

Date:

29 October 2007, 19:29:57

Thank you a lot for your report. Bug fixed and fixes are committed in CVS.

>   Thanks for the followup.  We will be upgrading this system to 8.2.5 in
> the next day or so and will test it on that anyways just to be sure it
> isn't related.  Let me know if there is anything else we can provide to
> help track down the bug.  We will be keeping this base backup and WAL
> files around so that we can use them for testing.
--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Re: Problem with PITR Past Particular WAL File

From

Craig McElroy

Date:

29 October 2007, 20:56:28

Teodor:
   That's great news.  Thanks for the followup.

Cheers,
-craig

---
Craig A. McElroy
Contegix
Beyond Managed Hosting(r) for Your Enterprise


On Oct 29, 2007, at 2:29 PM, Teodor Sigaev wrote:

> Thank you a lot for your report. Bug fixed and fixes are committed
> in CVS.
>
>>  Thanks for the followup.  We will be upgrading this system to
>> 8.2.5 in the next day or so and will test it on that anyways just
>> to be sure it isn't related.  Let me know if there is anything else
>> we can provide to help track down the bug.  We will be keeping this
>> base backup and WAL files around so that we can use them for testing.
> --
> Teodor Sigaev                                   E-mail: teodor@sigaev.ru
>                                                   WWW: http://www.sigaev.ru/
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>              http://www.postgresql.org/docs/faq