Thread: 9.4 checksum errors in recovery with gin index

9.4 checksum errors in recovery with gin index

From
Jeff Janes
Date:


When recovering from a crash (with injection of a partial page write at time of crash) against 7c7b1f4ae5ea3b1b113682d4d I get a checksum verification failure. 

16396 is a gin index.

If I have it ignore checksum failures, there is no apparent misbehavior.  I'm trying to bisect it, but it could take a while and I thought someone might have some theories based on the log:

29075  2014-05-06 23:29:51.411 PDT:LOG:  00000: database system was not properly shut down; automatic recovery in progress
29075  2014-05-06 23:29:51.411 PDT:LOCATION:  StartupXLOG, xlog.c:6361
29075  2014-05-06 23:29:51.412 PDT:LOG:  00000: redo starts at 11/323FE1C0
29075  2014-05-06 23:29:51.412 PDT:LOCATION:  StartupXLOG, xlog.c:6600
29075  2014-05-06 23:29:51.471 PDT:WARNING:  01000: page verification failed, calculated checksum 35967 but expected 7881
29075  2014-05-06 23:29:51.471 PDT:CONTEXT:  xlog redo Delete list pages (16), node: 1663/16384/16396 blkno: 0
29075  2014-05-06 23:29:51.471 PDT:LOCATION:  PageIsVerified, bufpage.c:145
29075  2014-05-06 23:29:51.471 PDT:FATAL:  XX001: invalid page in block 28486 of relation base/16384/16396
29075  2014-05-06 23:29:51.471 PDT:CONTEXT:  xlog redo Delete list pages (16), node: 1663/16384/16396 blkno: 0
29075  2014-05-06 23:29:51.471 PDT:LOCATION:  ReadBuffer_common, bufmgr.c:483
27799  2014-05-06 23:29:51.473 PDT:LOG:  00000: startup process (PID 29075) exited with exit code 1
27799  2014-05-06 23:29:51.473 PDT:LOCATION:  LogChildExit, postmaster.c:3281
27799  2014-05-06 23:29:51.473 PDT:LOG:  00000: aborting startup due to startup process failure

Cheers,

Jeff

Re: 9.4 checksum errors in recovery with gin index

From
Andres Freund
Date:
Hi,

On 2014-05-07 00:35:35 -0700, Jeff Janes wrote:
> When recovering from a crash (with injection of a partial page write at
> time of crash) against 7c7b1f4ae5ea3b1b113682d4d I get a checksum
> verification failure.
> 
> 16396 is a gin index.

Over which type? What was the load? make check?

> If I have it ignore checksum failures, there is no apparent misbehavior.
>  I'm trying to bisect it, but it could take a while and I thought someone
> might have some theories based on the log:

If you have the WAL a pg_xlogdump grepping for everything referring to
that block would be helpful.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: 9.4 checksum errors in recovery with gin index

From
Jeff Janes
Date:
On Wed, May 7, 2014 at 12:48 AM, Andres Freund <andres@2ndquadrant.com> wrote:
Hi,

On 2014-05-07 00:35:35 -0700, Jeff Janes wrote:
> When recovering from a crash (with injection of a partial page write at
> time of crash) against 7c7b1f4ae5ea3b1b113682d4d I get a checksum
> verification failure.
>
> 16396 is a gin index.

Over which type? What was the load? make check?

A gin index on text[].  

The load is a variation of the crash recovery tester I've been using the last few years, this time adapted to use a gin index in a rather unnatural way.  I just increment a counter on a random row repeatedly via a unique key, but for this purpose that unique key is stuffed into text[] along with a bunch of cruft.  The cruft is text representations of negative integers, the actual key is text representation of nonnegative integers.

The test harness (patch to induce crashes, and two driving programs) and a preserved data directory are here:


(role jjanes, database jjanes)

As far as I can tell, this problem goes back to the beginning of page checksums.



> If I have it ignore checksum failures, there is no apparent misbehavior.
>  I'm trying to bisect it, but it could take a while and I thought someone
> might have some theories based on the log:

If you have the WAL a pg_xlogdump grepping for everything referring to
that block would be helpful.

The only record which mentions block 28486 by name is this one:

rmgr: Gin         len (rec/tot):   1576/  1608, tx:   77882205, lsn: 11/30F4C2C0, prev 11/30F4C290, bkp: 0000, desc: Insert new list page, node: 1663/16384/16396 blkno: 28486

However, I think that that record precedes the recovery start point.

Cheers,

Jeff

Re: 9.4 checksum errors in recovery with gin index

From
Andres Freund
Date:
Hi,

On 2014-05-07 10:21:26 -0700, Jeff Janes wrote:
> On Wed, May 7, 2014 at 12:48 AM, Andres Freund <andres@2ndquadrant.com>wrote:
> 
> > Hi,
> >
> > On 2014-05-07 00:35:35 -0700, Jeff Janes wrote:
> > > When recovering from a crash (with injection of a partial page write at
> > > time of crash) against 7c7b1f4ae5ea3b1b113682d4d I get a checksum
> > > verification failure.
> > >
> > > 16396 is a gin index.
> >
> > Over which type? What was the load? make check?
> >
> 
> A gin index on text[].
> 
> The load is a variation of the crash recovery tester I've been using the
> last few years, this time adapted to use a gin index in a rather unnatural
> way.  I just increment a counter on a random row repeatedly via a unique
> key, but for this purpose that unique key is stuffed into text[] along with
> a bunch of cruft.  The cruft is text representations of negative integers,
> the actual key is text representation of nonnegative integers.
> 
> The test harness (patch to induce crashes, and two driving programs) and a
> preserved data directory are here:
> 
> https://drive.google.com/folderview?id=0Bzqrh1SO9FcESDZVeFk5djJaeHM&usp=sharing
> 
> (role jjanes, database jjanes)
> 
> As far as I can tell, this problem goes back to the beginning of page
> checksums.

Interesting.

> > > If I have it ignore checksum failures, there is no apparent misbehavior.
> > >  I'm trying to bisect it, but it could take a while and I thought someone
> > > might have some theories based on the log:
> >
> > If you have the WAL a pg_xlogdump grepping for everything referring to
> > that block would be helpful.
> >
> 
> The only record which mentions block 28486 by name is this one:

Hm, try running it with -b specified.

> rmgr: Gin         len (rec/tot):   1576/  1608, tx:   77882205, lsn:
> 11/30F4C2C0, prev 11/30F4C290, bkp: 0000, desc: Insert new list page, node:
> 1663/16384/16396 blkno: 28486
> 
> However, I think that that record precedes the recovery start point.

If that's the case it seems likely that a PageSetLSN() or PageSetDirty()
are missing somewhere...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: 9.4 checksum errors in recovery with gin index

From
Jeff Janes
Date:
On Wed, May 7, 2014 at 10:34 AM, Andres Freund <andres@2ndquadrant.com> wrote:
Hi,

On 2014-05-07 10:21:26 -0700, Jeff Janes wrote:
> On Wed, May 7, 2014 at 12:48 AM, Andres Freund <andres@2ndquadrant.com>wrote:
> > If you have the WAL a pg_xlogdump grepping for everything referring to
> > that block would be helpful.
> >
>
> The only record which mentions block 28486 by name is this one:

Hm, try running it with -b specified.

Still nothing more.
 

> rmgr: Gin         len (rec/tot):   1576/  1608, tx:   77882205, lsn:
> 11/30F4C2C0, prev 11/30F4C290, bkp: 0000, desc: Insert new list page, node:
> 1663/16384/16396 blkno: 28486
>
> However, I think that that record precedes the recovery start point.

If that's the case it seems likely that a PageSetLSN() or PageSetDirty()
are missing somewhere...

Wouldn't that result in corrupted data after crash recovery when checksum failures are ignored?  I haven't seen any of that.

Cheers,

Jeff
 

Re: 9.4 checksum errors in recovery with gin index

From
Heikki Linnakangas
Date:
On 05/07/2014 10:35 AM, Jeff Janes wrote:
> When recovering from a crash (with injection of a partial page write at
> time of crash) against 7c7b1f4ae5ea3b1b113682d4d I get a checksum
> verification failure.
>
> 16396 is a gin index.
>
> If I have it ignore checksum failures, there is no apparent misbehavior.
>   I'm trying to bisect it, but it could take a while and I thought someone
> might have some theories based on the log:
>
> 29075  2014-05-06 23:29:51.411 PDT:LOG:  00000: database system was not
> properly shut down; automatic recovery in progress
> 29075  2014-05-06 23:29:51.411 PDT:LOCATION:  StartupXLOG, xlog.c:6361
> 29075  2014-05-06 23:29:51.412 PDT:LOG:  00000: redo starts at 11/323FE1C0
> 29075  2014-05-06 23:29:51.412 PDT:LOCATION:  StartupXLOG, xlog.c:6600
> 29075  2014-05-06 23:29:51.471 PDT:WARNING:  01000: page verification
> failed, calculated checksum 35967 but expected 7881
> 29075  2014-05-06 23:29:51.471 PDT:CONTEXT:  xlog redo Delete list pages

A-ha. The WAL record of list page deletion doesn't create a full-page 
images of the deleted pages. That's pretty sensible, as a deleted page 
doesn't contain any data that needs to be retained. However, if a 
full-page image is not taken, then the page should be completely 
recreated from scratch at replay. It's not doing that.

Thanks for the testing! I'll have a stab at fixing that tomorrow. 
Basically, ginRedoDeleteListPages() needs to re-initialize the deleted 
pages.

- Heikki



Re: 9.4 checksum errors in recovery with gin index

From
Jeff Janes
Date:
On Wed, May 7, 2014 at 1:40 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
On 05/07/2014 10:35 AM, Jeff Janes wrote:
When recovering from a crash (with injection of a partial page write at
time of crash) against 7c7b1f4ae5ea3b1b113682d4d I get a checksum
verification failure.

16396 is a gin index.

If I have it ignore checksum failures, there is no apparent misbehavior.
  I'm trying to bisect it, but it could take a while and I thought someone
might have some theories based on the log:

29075  2014-05-06 23:29:51.411 PDT:LOG:  00000: database system was not
properly shut down; automatic recovery in progress
29075  2014-05-06 23:29:51.411 PDT:LOCATION:  StartupXLOG, xlog.c:6361
29075  2014-05-06 23:29:51.412 PDT:LOG:  00000: redo starts at 11/323FE1C0
29075  2014-05-06 23:29:51.412 PDT:LOCATION:  StartupXLOG, xlog.c:6600
29075  2014-05-06 23:29:51.471 PDT:WARNING:  01000: page verification
failed, calculated checksum 35967 but expected 7881
29075  2014-05-06 23:29:51.471 PDT:CONTEXT:  xlog redo Delete list pages

A-ha. The WAL record of list page deletion doesn't create a full-page images of the deleted pages. That's pretty sensible, as a deleted page doesn't contain any data that needs to be retained. However, if a full-page image is not taken, then the page should be completely recreated from scratch at replay. It's not doing that.

Thanks for the testing! I'll have a stab at fixing that tomorrow. Basically, ginRedoDeleteListPages() needs to re-initialize the deleted pages.


It looks like it is solved now.

Thanks,

Jeff