Thread: GIN needs tonic

GIN needs tonic

From
Simon Riggs
Date:
In recovery of GIN operations using CVS HEAD I see consistently

TRAP: FailedAssertion("!(((bool) ((spcNode) != ((Oid) 0))))", File:
"tablespace.c", Line: 116)

Looking at code, "new list page" WAL record is a GIN record type and at
line 115 in gin/ginfast.c I see that the value of the node is unset.
That means XLOG_GIN_INSERT_LISTPAGE always has specNode == 0 and so
triggers the assertion failure.

--
 Simon Riggs           www.2ndQuadrant.com

Re: GIN needs tonic

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> In recovery of GIN operations using CVS HEAD I see consistently
>
> TRAP: FailedAssertion("!(((bool) ((spcNode) != ((Oid) 0))))", File:
> "tablespace.c", Line: 116)
>
> Looking at code, "new list page" WAL record is a GIN record type and at
> line 115 in gin/ginfast.c I see that the value of the node is unset.
> That means XLOG_GIN_INSERT_LISTPAGE always has specNode == 0 and so
> triggers the assertion failure.

Yeah, so it seems. The fix should be as simple as:

--- a/src/backend/access/gin/ginfast.c
+++ b/src/backend/access/gin/ginfast.c
@@ -121,6 +121,7 @@ writeListPage(Relation index, Buffer buffer,
         rdata[1].len = size;
         rdata[1].next = NULL;

+        data.node = index->rd_node;
         data.blkno = BufferGetBlockNumber(buffer);
         data.rightlink = rightlink;
         data.ntuples = ntuples;


This means that the WAL replay of that record type has never been tested
correctly :-(. Looking closer at writeListPage(), why does it always
include 'workspace' in the WAL record, even if a full-page-image is
taken? It's not used for anything the the redo function. That's
harmless, but bloats the WAL record unnecessary. In fact it might be
better to never do full-page writes for that record type, since it
completely overwrites the page anyway.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

Re: GIN needs tonic

From
Simon Riggs
Date:
On Tue, 2009-09-15 at 09:41 +0300, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > In recovery of GIN operations using CVS HEAD I see consistently
> >
> > TRAP: FailedAssertion("!(((bool) ((spcNode) != ((Oid) 0))))", File:
> > "tablespace.c", Line: 116)
> >
> > Looking at code, "new list page" WAL record is a GIN record type and at
> > line 115 in gin/ginfast.c I see that the value of the node is unset.
> > That means XLOG_GIN_INSERT_LISTPAGE always has specNode == 0 and so
> > triggers the assertion failure.
>
> Yeah, so it seems. The fix should be as simple as:

I can't speak for anything more than that it stops it from failing with
an assertion. That is very good since it allows me to bypass it and
continue with my own testing.

> This means that the WAL replay of that record type has never been tested
> correctly :-(.

This must have been added after mid-Feb this year. I notice there are a
few places where functionality is tested against temp tables, which may
mask other non-recoverable issues in this and other rmgrs. We should
make it standard practice to include only non-temp tables to cover
functionality other than specific temp table commands.

--
 Simon Riggs           www.2ndQuadrant.com

Re: GIN needs tonic

From
Tom Lane
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> This means that the WAL replay of that record type has never been tested
> correctly :-(. Looking closer at writeListPage(), why does it always
> include 'workspace' in the WAL record, even if a full-page-image is
> taken? It's not used for anything the the redo function. That's
> harmless, but bloats the WAL record unnecessary. In fact it might be
> better to never do full-page writes for that record type, since it
> completely overwrites the page anyway.

Actually, the entire thing is misdesigned from the get-go.  AFAICS
it shouldn't even have its own WAL record type --- it should be using
log_newpage().

            regards, tom lane

Re: GIN needs tonic

From
Heikki Linnakangas
Date:
Tom Lane wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>> This means that the WAL replay of that record type has never been tested
>> correctly :-(. Looking closer at writeListPage(), why does it always
>> include 'workspace' in the WAL record, even if a full-page-image is
>> taken? It's not used for anything the the redo function. That's
>> harmless, but bloats the WAL record unnecessary. In fact it might be
>> better to never do full-page writes for that record type, since it
>> completely overwrites the page anyway.
>
> Actually, the entire thing is misdesigned from the get-go.  AFAICS
> it shouldn't even have its own WAL record type --- it should be using
> log_newpage().

Yeah, that would be even simpler. The WAL records it currenctly writes
are more compact, but then again it probably makes no difference in
practice.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

Re: GIN needs tonic

From
Tom Lane
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Tom Lane wrote:
>> Actually, the entire thing is misdesigned from the get-go.  AFAICS
>> it shouldn't even have its own WAL record type --- it should be using
>> log_newpage().

> Yeah, that would be even simpler. The WAL records it currenctly writes
> are more compact, but then again it probably makes no difference in
> practice.

I'm working on this now.  It looks to me like there are a number of bugs
in this code, notably I think that the accounting for free space on
GinList pages is failing to allow for alignment effects.

One difficulty in using log_newpage() directly is that it presently
assumes that the page it's passed has been built in private memory,
not in a shared buffer.  I'm not sure if it's worth restructuring the
GIN code to do it that way.

            regards, tom lane

Re: GIN needs tonic

From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:
> On Tue, 2009-09-15 at 09:41 +0300, Heikki Linnakangas wrote:
>> This means that the WAL replay of that record type has never been tested
>> correctly :-(.

> This must have been added after mid-Feb this year. I notice there are a
> few places where functionality is tested against temp tables, which may
> mask other non-recoverable issues in this and other rmgrs. We should
> make it standard practice to include only non-temp tables to cover
> functionality other than specific temp table commands.

I've pointed out before that the regression tests are not particularly
meant to provide an exhaustive test of WAL recovery.  In this particular
case, so far as I can tell the bug is only observable with
full_page_writes turned off --- otherwise XLogInsert will invariably
decide to log the full page, because it's going to see a zeroed-out
LSN in the passed-in buffer.  So the odds are good that regression
testing wouldn't have caught it anyway.

I'm in favor of trying to produce a separate set of tests that cover
WAL recovery behavior; but imposing arbitrary restrictions on the
regular regression tests is not the path to get there.

            regards, tom lane

Re: GIN needs tonic

From
Simon Riggs
Date:
On Tue, 2009-09-15 at 14:31 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
> > On Tue, 2009-09-15 at 09:41 +0300, Heikki Linnakangas wrote:
> >> This means that the WAL replay of that record type has never been tested
> >> correctly :-(.
>
> > This must have been added after mid-Feb this year. I notice there are a
> > few places where functionality is tested against temp tables, which may
> > mask other non-recoverable issues in this and other rmgrs. We should
> > make it standard practice to include only non-temp tables to cover
> > functionality other than specific temp table commands.
>
> I've pointed out before that the regression tests are not particularly
> meant to provide an exhaustive test of WAL recovery.  In this particular
> case, so far as I can tell the bug is only observable with
> full_page_writes turned off --- otherwise XLogInsert will invariably
> decide to log the full page, because it's going to see a zeroed-out
> LSN in the passed-in buffer.

Yes, I was testing with full_page_writes = off at that point.

--
 Simon Riggs           www.2ndQuadrant.com

Re: GIN needs tonic

From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:
> On Tue, 2009-09-15 at 14:31 -0400, Tom Lane wrote:
>> I've pointed out before that the regression tests are not particularly
>> meant to provide an exhaustive test of WAL recovery.  In this particular
>> case, so far as I can tell the bug is only observable with
>> full_page_writes turned off --- otherwise XLogInsert will invariably
>> decide to log the full page, because it's going to see a zeroed-out
>> LSN in the passed-in buffer.

> Yes, I was testing with full_page_writes = off at that point.

In further testing, I've found that the CVS HEAD regression tests
actually will expose the error if you run them with full_page_writes off
and then force a WAL replay.  (8.4 would not have, though, because of the
DROP TABLESPACE at the end of the run.)  In any case, I stand by the
opinion that the standard regression tests aren't really meant to
exercise WAL recovery.

BTW, there's more than one bug here :-(.  Heikki found one, but the
code is also attaching the buffer indicator to the wrong rdata entry
--- the record header, not the workspace, is what gets suppressed
if the full page is logged.

            regards, tom lane

Re: GIN needs tonic

From
Simon Riggs
Date:
On Tue, 2009-09-15 at 15:34 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
> > On Tue, 2009-09-15 at 14:31 -0400, Tom Lane wrote:
> >> I've pointed out before that the regression tests are not particularly
> >> meant to provide an exhaustive test of WAL recovery.  In this particular
> >> case, so far as I can tell the bug is only observable with
> >> full_page_writes turned off --- otherwise XLogInsert will invariably
> >> decide to log the full page, because it's going to see a zeroed-out
> >> LSN in the passed-in buffer.
>
> > Yes, I was testing with full_page_writes = off at that point.
>
> In further testing, I've found that the CVS HEAD regression tests
> actually will expose the error if you run them with full_page_writes off
> and then force a WAL replay.  (8.4 would not have, though, because of the
> DROP TABLESPACE at the end of the run.)  In any case, I stand by the
> opinion that the standard regression tests aren't really meant to
> exercise WAL recovery.

Yes, that's exactly how I located the first bug. Sorry if that wasn't
clear.

> BTW, there's more than one bug here :-(.  Heikki found one, but the
> code is also attaching the buffer indicator to the wrong rdata entry
> --- the record header, not the workspace, is what gets suppressed
> if the full page is logged.

Well, whether they were meant to they do and to a much greater extent
than any other body of tests that I'm aware of. Even if we had another
test suite I would still expect recovery to handle a run on the primary
of the full regression tests without problem. I look for multiple ways
of testing and that is just one.

I guess if you or another committer spends some time writing a test
framework that is useful and that you can trust, I'm sure many people
will add to it. That'll be true for any of the major/complex areas not
covered by public test suites: concurrency, recovery and the optimizer.

--
 Simon Riggs           www.2ndQuadrant.com

Re: GIN needs tonic

From
Heikki Linnakangas
Date:
Tom Lane wrote:
> BTW, there's more than one bug here :-(.  Heikki found one, but the
> code is also attaching the buffer indicator to the wrong rdata entry
> --- the record header, not the workspace, is what gets suppressed
> if the full page is logged.

I saw that, but I figured it should be attaching both rdata entries to
the buffer, so that both are suppressed if a full page image is logged.
I guess the header can be useful for debugging purposes, even though the
redo function doesn't use it for anything.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

Re: GIN needs tonic

From
Simon Riggs
Date:
On Tue, 2009-09-15 at 12:09 -0400, Tom Lane wrote:
> I'm working on this now.

Thanks to you and Heikki for fixing this while I worked around it.

--
 Simon Riggs           www.2ndQuadrant.com

Re: GIN needs tonic

From
Tom Lane
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Tom Lane wrote:
>> BTW, there's more than one bug here :-(.  Heikki found one, but the
>> code is also attaching the buffer indicator to the wrong rdata entry
>> --- the record header, not the workspace, is what gets suppressed
>> if the full page is logged.

> I saw that, but I figured it should be attaching both rdata entries to
> the buffer, so that both are suppressed if a full page image is logged.
> I guess the header can be useful for debugging purposes, even though the
> redo function doesn't use it for anything.

Yeah, for instance xlog logging (gin_desc) expects the header to be there.

As you mentioned, another tack we could take is to not connect the
buffer to the xlog record at all, and just reinitialize the page from
the xlog record contents.  I left it as was, but this might be a
reasonable alternative to using HEAP_NEWPAGE.

            regards, tom lane

Re: GIN needs tonic

From
Robert Haas
Date:
On Tue, Sep 15, 2009 at 4:41 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> I guess if you or another committer spends some time writing a test
> framework that is useful and that you can trust, I'm sure many people
> will add to it. That'll be true for any of the major/complex areas not
> covered by public test suites: concurrency, recovery and the optimizer.

+1!

...Robert