Thread: GIN needs tonic
In recovery of GIN operations using CVS HEAD I see consistently TRAP: FailedAssertion("!(((bool) ((spcNode) != ((Oid) 0))))", File: "tablespace.c", Line: 116) Looking at code, "new list page" WAL record is a GIN record type and at line 115 in gin/ginfast.c I see that the value of the node is unset. That means XLOG_GIN_INSERT_LISTPAGE always has specNode == 0 and so triggers the assertion failure. -- Simon Riggs www.2ndQuadrant.com
Simon Riggs wrote: > In recovery of GIN operations using CVS HEAD I see consistently > > TRAP: FailedAssertion("!(((bool) ((spcNode) != ((Oid) 0))))", File: > "tablespace.c", Line: 116) > > Looking at code, "new list page" WAL record is a GIN record type and at > line 115 in gin/ginfast.c I see that the value of the node is unset. > That means XLOG_GIN_INSERT_LISTPAGE always has specNode == 0 and so > triggers the assertion failure. Yeah, so it seems. The fix should be as simple as: --- a/src/backend/access/gin/ginfast.c +++ b/src/backend/access/gin/ginfast.c @@ -121,6 +121,7 @@ writeListPage(Relation index, Buffer buffer, rdata[1].len = size; rdata[1].next = NULL; + data.node = index->rd_node; data.blkno = BufferGetBlockNumber(buffer); data.rightlink = rightlink; data.ntuples = ntuples; This means that the WAL replay of that record type has never been tested correctly :-(. Looking closer at writeListPage(), why does it always include 'workspace' in the WAL record, even if a full-page-image is taken? It's not used for anything the the redo function. That's harmless, but bloats the WAL record unnecessary. In fact it might be better to never do full-page writes for that record type, since it completely overwrites the page anyway. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Tue, 2009-09-15 at 09:41 +0300, Heikki Linnakangas wrote: > Simon Riggs wrote: > > In recovery of GIN operations using CVS HEAD I see consistently > > > > TRAP: FailedAssertion("!(((bool) ((spcNode) != ((Oid) 0))))", File: > > "tablespace.c", Line: 116) > > > > Looking at code, "new list page" WAL record is a GIN record type and at > > line 115 in gin/ginfast.c I see that the value of the node is unset. > > That means XLOG_GIN_INSERT_LISTPAGE always has specNode == 0 and so > > triggers the assertion failure. > > Yeah, so it seems. The fix should be as simple as: I can't speak for anything more than that it stops it from failing with an assertion. That is very good since it allows me to bypass it and continue with my own testing. > This means that the WAL replay of that record type has never been tested > correctly :-(. This must have been added after mid-Feb this year. I notice there are a few places where functionality is tested against temp tables, which may mask other non-recoverable issues in this and other rmgrs. We should make it standard practice to include only non-temp tables to cover functionality other than specific temp table commands. -- Simon Riggs www.2ndQuadrant.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > This means that the WAL replay of that record type has never been tested > correctly :-(. Looking closer at writeListPage(), why does it always > include 'workspace' in the WAL record, even if a full-page-image is > taken? It's not used for anything the the redo function. That's > harmless, but bloats the WAL record unnecessary. In fact it might be > better to never do full-page writes for that record type, since it > completely overwrites the page anyway. Actually, the entire thing is misdesigned from the get-go. AFAICS it shouldn't even have its own WAL record type --- it should be using log_newpage(). regards, tom lane
Tom Lane wrote: > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: >> This means that the WAL replay of that record type has never been tested >> correctly :-(. Looking closer at writeListPage(), why does it always >> include 'workspace' in the WAL record, even if a full-page-image is >> taken? It's not used for anything the the redo function. That's >> harmless, but bloats the WAL record unnecessary. In fact it might be >> better to never do full-page writes for that record type, since it >> completely overwrites the page anyway. > > Actually, the entire thing is misdesigned from the get-go. AFAICS > it shouldn't even have its own WAL record type --- it should be using > log_newpage(). Yeah, that would be even simpler. The WAL records it currenctly writes are more compact, but then again it probably makes no difference in practice. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > Tom Lane wrote: >> Actually, the entire thing is misdesigned from the get-go. AFAICS >> it shouldn't even have its own WAL record type --- it should be using >> log_newpage(). > Yeah, that would be even simpler. The WAL records it currenctly writes > are more compact, but then again it probably makes no difference in > practice. I'm working on this now. It looks to me like there are a number of bugs in this code, notably I think that the accounting for free space on GinList pages is failing to allow for alignment effects. One difficulty in using log_newpage() directly is that it presently assumes that the page it's passed has been built in private memory, not in a shared buffer. I'm not sure if it's worth restructuring the GIN code to do it that way. regards, tom lane
Simon Riggs <simon@2ndQuadrant.com> writes: > On Tue, 2009-09-15 at 09:41 +0300, Heikki Linnakangas wrote: >> This means that the WAL replay of that record type has never been tested >> correctly :-(. > This must have been added after mid-Feb this year. I notice there are a > few places where functionality is tested against temp tables, which may > mask other non-recoverable issues in this and other rmgrs. We should > make it standard practice to include only non-temp tables to cover > functionality other than specific temp table commands. I've pointed out before that the regression tests are not particularly meant to provide an exhaustive test of WAL recovery. In this particular case, so far as I can tell the bug is only observable with full_page_writes turned off --- otherwise XLogInsert will invariably decide to log the full page, because it's going to see a zeroed-out LSN in the passed-in buffer. So the odds are good that regression testing wouldn't have caught it anyway. I'm in favor of trying to produce a separate set of tests that cover WAL recovery behavior; but imposing arbitrary restrictions on the regular regression tests is not the path to get there. regards, tom lane
On Tue, 2009-09-15 at 14:31 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: > > On Tue, 2009-09-15 at 09:41 +0300, Heikki Linnakangas wrote: > >> This means that the WAL replay of that record type has never been tested > >> correctly :-(. > > > This must have been added after mid-Feb this year. I notice there are a > > few places where functionality is tested against temp tables, which may > > mask other non-recoverable issues in this and other rmgrs. We should > > make it standard practice to include only non-temp tables to cover > > functionality other than specific temp table commands. > > I've pointed out before that the regression tests are not particularly > meant to provide an exhaustive test of WAL recovery. In this particular > case, so far as I can tell the bug is only observable with > full_page_writes turned off --- otherwise XLogInsert will invariably > decide to log the full page, because it's going to see a zeroed-out > LSN in the passed-in buffer. Yes, I was testing with full_page_writes = off at that point. -- Simon Riggs www.2ndQuadrant.com
Simon Riggs <simon@2ndQuadrant.com> writes: > On Tue, 2009-09-15 at 14:31 -0400, Tom Lane wrote: >> I've pointed out before that the regression tests are not particularly >> meant to provide an exhaustive test of WAL recovery. In this particular >> case, so far as I can tell the bug is only observable with >> full_page_writes turned off --- otherwise XLogInsert will invariably >> decide to log the full page, because it's going to see a zeroed-out >> LSN in the passed-in buffer. > Yes, I was testing with full_page_writes = off at that point. In further testing, I've found that the CVS HEAD regression tests actually will expose the error if you run them with full_page_writes off and then force a WAL replay. (8.4 would not have, though, because of the DROP TABLESPACE at the end of the run.) In any case, I stand by the opinion that the standard regression tests aren't really meant to exercise WAL recovery. BTW, there's more than one bug here :-(. Heikki found one, but the code is also attaching the buffer indicator to the wrong rdata entry --- the record header, not the workspace, is what gets suppressed if the full page is logged. regards, tom lane
On Tue, 2009-09-15 at 15:34 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: > > On Tue, 2009-09-15 at 14:31 -0400, Tom Lane wrote: > >> I've pointed out before that the regression tests are not particularly > >> meant to provide an exhaustive test of WAL recovery. In this particular > >> case, so far as I can tell the bug is only observable with > >> full_page_writes turned off --- otherwise XLogInsert will invariably > >> decide to log the full page, because it's going to see a zeroed-out > >> LSN in the passed-in buffer. > > > Yes, I was testing with full_page_writes = off at that point. > > In further testing, I've found that the CVS HEAD regression tests > actually will expose the error if you run them with full_page_writes off > and then force a WAL replay. (8.4 would not have, though, because of the > DROP TABLESPACE at the end of the run.) In any case, I stand by the > opinion that the standard regression tests aren't really meant to > exercise WAL recovery. Yes, that's exactly how I located the first bug. Sorry if that wasn't clear. > BTW, there's more than one bug here :-(. Heikki found one, but the > code is also attaching the buffer indicator to the wrong rdata entry > --- the record header, not the workspace, is what gets suppressed > if the full page is logged. Well, whether they were meant to they do and to a much greater extent than any other body of tests that I'm aware of. Even if we had another test suite I would still expect recovery to handle a run on the primary of the full regression tests without problem. I look for multiple ways of testing and that is just one. I guess if you or another committer spends some time writing a test framework that is useful and that you can trust, I'm sure many people will add to it. That'll be true for any of the major/complex areas not covered by public test suites: concurrency, recovery and the optimizer. -- Simon Riggs www.2ndQuadrant.com
Tom Lane wrote: > BTW, there's more than one bug here :-(. Heikki found one, but the > code is also attaching the buffer indicator to the wrong rdata entry > --- the record header, not the workspace, is what gets suppressed > if the full page is logged. I saw that, but I figured it should be attaching both rdata entries to the buffer, so that both are suppressed if a full page image is logged. I guess the header can be useful for debugging purposes, even though the redo function doesn't use it for anything. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Tue, 2009-09-15 at 12:09 -0400, Tom Lane wrote: > I'm working on this now. Thanks to you and Heikki for fixing this while I worked around it. -- Simon Riggs www.2ndQuadrant.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > Tom Lane wrote: >> BTW, there's more than one bug here :-(. Heikki found one, but the >> code is also attaching the buffer indicator to the wrong rdata entry >> --- the record header, not the workspace, is what gets suppressed >> if the full page is logged. > I saw that, but I figured it should be attaching both rdata entries to > the buffer, so that both are suppressed if a full page image is logged. > I guess the header can be useful for debugging purposes, even though the > redo function doesn't use it for anything. Yeah, for instance xlog logging (gin_desc) expects the header to be there. As you mentioned, another tack we could take is to not connect the buffer to the xlog record at all, and just reinitialize the page from the xlog record contents. I left it as was, but this might be a reasonable alternative to using HEAP_NEWPAGE. regards, tom lane
On Tue, Sep 15, 2009 at 4:41 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > I guess if you or another committer spends some time writing a test > framework that is useful and that you can trust, I'm sure many people > will add to it. That'll be true for any of the major/complex areas not > covered by public test suites: concurrency, recovery and the optimizer. +1! ...Robert