Thread: 9.4 checksum errors in recovery with gin index
When recovering from a crash (with injection of a partial page write at time of crash) against 7c7b1f4ae5ea3b1b113682d4d I get a checksum verification failure.
16396 is a gin index.
If I have it ignore checksum failures, there is no apparent misbehavior. I'm trying to bisect it, but it could take a while and I thought someone might have some theories based on the log:
29075 2014-05-06 23:29:51.411 PDT:LOG: 00000: database system was not properly shut down; automatic recovery in progress
29075 2014-05-06 23:29:51.411 PDT:LOCATION: StartupXLOG, xlog.c:6361
29075 2014-05-06 23:29:51.412 PDT:LOG: 00000: redo starts at 11/323FE1C0
29075 2014-05-06 23:29:51.412 PDT:LOCATION: StartupXLOG, xlog.c:6600
29075 2014-05-06 23:29:51.471 PDT:WARNING: 01000: page verification failed, calculated checksum 35967 but expected 7881
29075 2014-05-06 23:29:51.471 PDT:CONTEXT: xlog redo Delete list pages (16), node: 1663/16384/16396 blkno: 0
29075 2014-05-06 23:29:51.471 PDT:LOCATION: PageIsVerified, bufpage.c:145
29075 2014-05-06 23:29:51.471 PDT:FATAL: XX001: invalid page in block 28486 of relation base/16384/16396
29075 2014-05-06 23:29:51.471 PDT:CONTEXT: xlog redo Delete list pages (16), node: 1663/16384/16396 blkno: 0
29075 2014-05-06 23:29:51.471 PDT:LOCATION: ReadBuffer_common, bufmgr.c:483
27799 2014-05-06 23:29:51.473 PDT:LOG: 00000: startup process (PID 29075) exited with exit code 1
27799 2014-05-06 23:29:51.473 PDT:LOCATION: LogChildExit, postmaster.c:3281
27799 2014-05-06 23:29:51.473 PDT:LOG: 00000: aborting startup due to startup process failure
Cheers,
Jeff
Hi, On 2014-05-07 00:35:35 -0700, Jeff Janes wrote: > When recovering from a crash (with injection of a partial page write at > time of crash) against 7c7b1f4ae5ea3b1b113682d4d I get a checksum > verification failure. > > 16396 is a gin index. Over which type? What was the load? make check? > If I have it ignore checksum failures, there is no apparent misbehavior. > I'm trying to bisect it, but it could take a while and I thought someone > might have some theories based on the log: If you have the WAL a pg_xlogdump grepping for everything referring to that block would be helpful. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Wed, May 7, 2014 at 12:48 AM, Andres Freund <andres@2ndquadrant.com> wrote:
Hi,Over which type? What was the load? make check?
On 2014-05-07 00:35:35 -0700, Jeff Janes wrote:
> When recovering from a crash (with injection of a partial page write at
> time of crash) against 7c7b1f4ae5ea3b1b113682d4d I get a checksum
> verification failure.
>
> 16396 is a gin index.
A gin index on text[].
The load is a variation of the crash recovery tester I've been using the last few years, this time adapted to use a gin index in a rather unnatural way. I just increment a counter on a random row repeatedly via a unique key, but for this purpose that unique key is stuffed into text[] along with a bunch of cruft. The cruft is text representations of negative integers, the actual key is text representation of nonnegative integers.
The test harness (patch to induce crashes, and two driving programs) and a preserved data directory are here:
(role jjanes, database jjanes)
As far as I can tell, this problem goes back to the beginning of page checksums.
If you have the WAL a pg_xlogdump grepping for everything referring to
> If I have it ignore checksum failures, there is no apparent misbehavior.
> I'm trying to bisect it, but it could take a while and I thought someone
> might have some theories based on the log:
that block would be helpful.
The only record which mentions block 28486 by name is this one:
rmgr: Gin len (rec/tot): 1576/ 1608, tx: 77882205, lsn: 11/30F4C2C0, prev 11/30F4C290, bkp: 0000, desc: Insert new list page, node: 1663/16384/16396 blkno: 28486
However, I think that that record precedes the recovery start point.
Cheers,
Jeff
Hi, On 2014-05-07 10:21:26 -0700, Jeff Janes wrote: > On Wed, May 7, 2014 at 12:48 AM, Andres Freund <andres@2ndquadrant.com>wrote: > > > Hi, > > > > On 2014-05-07 00:35:35 -0700, Jeff Janes wrote: > > > When recovering from a crash (with injection of a partial page write at > > > time of crash) against 7c7b1f4ae5ea3b1b113682d4d I get a checksum > > > verification failure. > > > > > > 16396 is a gin index. > > > > Over which type? What was the load? make check? > > > > A gin index on text[]. > > The load is a variation of the crash recovery tester I've been using the > last few years, this time adapted to use a gin index in a rather unnatural > way. I just increment a counter on a random row repeatedly via a unique > key, but for this purpose that unique key is stuffed into text[] along with > a bunch of cruft. The cruft is text representations of negative integers, > the actual key is text representation of nonnegative integers. > > The test harness (patch to induce crashes, and two driving programs) and a > preserved data directory are here: > > https://drive.google.com/folderview?id=0Bzqrh1SO9FcESDZVeFk5djJaeHM&usp=sharing > > (role jjanes, database jjanes) > > As far as I can tell, this problem goes back to the beginning of page > checksums. Interesting. > > > If I have it ignore checksum failures, there is no apparent misbehavior. > > > I'm trying to bisect it, but it could take a while and I thought someone > > > might have some theories based on the log: > > > > If you have the WAL a pg_xlogdump grepping for everything referring to > > that block would be helpful. > > > > The only record which mentions block 28486 by name is this one: Hm, try running it with -b specified. > rmgr: Gin len (rec/tot): 1576/ 1608, tx: 77882205, lsn: > 11/30F4C2C0, prev 11/30F4C290, bkp: 0000, desc: Insert new list page, node: > 1663/16384/16396 blkno: 28486 > > However, I think that that record precedes the recovery start point. If that's the case it seems likely that a PageSetLSN() or PageSetDirty() are missing somewhere... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Wed, May 7, 2014 at 10:34 AM, Andres Freund <andres@2ndquadrant.com> wrote:
Hi,
On 2014-05-07 10:21:26 -0700, Jeff Janes wrote:
> On Wed, May 7, 2014 at 12:48 AM, Andres Freund <andres@2ndquadrant.com>wrote:> > If you have the WAL a pg_xlogdump grepping for everything referring toHm, try running it with -b specified.
> > that block would be helpful.
> >
>
> The only record which mentions block 28486 by name is this one:
Still nothing more.
If that's the case it seems likely that a PageSetLSN() or PageSetDirty()
> rmgr: Gin len (rec/tot): 1576/ 1608, tx: 77882205, lsn:
> 11/30F4C2C0, prev 11/30F4C290, bkp: 0000, desc: Insert new list page, node:
> 1663/16384/16396 blkno: 28486
>
> However, I think that that record precedes the recovery start point.
are missing somewhere...
Wouldn't that result in corrupted data after crash recovery when checksum failures are ignored? I haven't seen any of that.
Cheers,
Jeff
On 05/07/2014 10:35 AM, Jeff Janes wrote: > When recovering from a crash (with injection of a partial page write at > time of crash) against 7c7b1f4ae5ea3b1b113682d4d I get a checksum > verification failure. > > 16396 is a gin index. > > If I have it ignore checksum failures, there is no apparent misbehavior. > I'm trying to bisect it, but it could take a while and I thought someone > might have some theories based on the log: > > 29075 2014-05-06 23:29:51.411 PDT:LOG: 00000: database system was not > properly shut down; automatic recovery in progress > 29075 2014-05-06 23:29:51.411 PDT:LOCATION: StartupXLOG, xlog.c:6361 > 29075 2014-05-06 23:29:51.412 PDT:LOG: 00000: redo starts at 11/323FE1C0 > 29075 2014-05-06 23:29:51.412 PDT:LOCATION: StartupXLOG, xlog.c:6600 > 29075 2014-05-06 23:29:51.471 PDT:WARNING: 01000: page verification > failed, calculated checksum 35967 but expected 7881 > 29075 2014-05-06 23:29:51.471 PDT:CONTEXT: xlog redo Delete list pages A-ha. The WAL record of list page deletion doesn't create a full-page images of the deleted pages. That's pretty sensible, as a deleted page doesn't contain any data that needs to be retained. However, if a full-page image is not taken, then the page should be completely recreated from scratch at replay. It's not doing that. Thanks for the testing! I'll have a stab at fixing that tomorrow. Basically, ginRedoDeleteListPages() needs to re-initialize the deleted pages. - Heikki
On Wed, May 7, 2014 at 1:40 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
On 05/07/2014 10:35 AM, Jeff Janes wrote:A-ha. The WAL record of list page deletion doesn't create a full-page images of the deleted pages. That's pretty sensible, as a deleted page doesn't contain any data that needs to be retained. However, if a full-page image is not taken, then the page should be completely recreated from scratch at replay. It's not doing that.When recovering from a crash (with injection of a partial page write at
time of crash) against 7c7b1f4ae5ea3b1b113682d4d I get a checksum
verification failure.
16396 is a gin index.
If I have it ignore checksum failures, there is no apparent misbehavior.
I'm trying to bisect it, but it could take a while and I thought someone
might have some theories based on the log:
29075 2014-05-06 23:29:51.411 PDT:LOG: 00000: database system was not
properly shut down; automatic recovery in progress
29075 2014-05-06 23:29:51.411 PDT:LOCATION: StartupXLOG, xlog.c:6361
29075 2014-05-06 23:29:51.412 PDT:LOG: 00000: redo starts at 11/323FE1C0
29075 2014-05-06 23:29:51.412 PDT:LOCATION: StartupXLOG, xlog.c:6600
29075 2014-05-06 23:29:51.471 PDT:WARNING: 01000: page verification
failed, calculated checksum 35967 but expected 7881
29075 2014-05-06 23:29:51.471 PDT:CONTEXT: xlog redo Delete list pages
Thanks for the testing! I'll have a stab at fixing that tomorrow. Basically, ginRedoDeleteListPages() needs to re-initialize the deleted pages.
It looks like it is solved now.
Thanks,
Jeff