Re: 9.4 checksum errors in recovery with gin index - Mailing list pgsql-hackers

From Jeff Janes
Subject Re: 9.4 checksum errors in recovery with gin index
Date
Msg-id CAMkU=1weqUAVPW2F+c3Ok5VFfvEjyfJXb4dH8B7v57D5WXftkA@mail.gmail.com
Whole thread Raw
In response to Re: 9.4 checksum errors in recovery with gin index  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: 9.4 checksum errors in recovery with gin index  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
On Wed, May 7, 2014 at 12:48 AM, Andres Freund <andres@2ndquadrant.com> wrote:
Hi,

On 2014-05-07 00:35:35 -0700, Jeff Janes wrote:
> When recovering from a crash (with injection of a partial page write at
> time of crash) against 7c7b1f4ae5ea3b1b113682d4d I get a checksum
> verification failure.
>
> 16396 is a gin index.

Over which type? What was the load? make check?

A gin index on text[].  

The load is a variation of the crash recovery tester I've been using the last few years, this time adapted to use a gin index in a rather unnatural way.  I just increment a counter on a random row repeatedly via a unique key, but for this purpose that unique key is stuffed into text[] along with a bunch of cruft.  The cruft is text representations of negative integers, the actual key is text representation of nonnegative integers.

The test harness (patch to induce crashes, and two driving programs) and a preserved data directory are here:


(role jjanes, database jjanes)

As far as I can tell, this problem goes back to the beginning of page checksums.



> If I have it ignore checksum failures, there is no apparent misbehavior.
>  I'm trying to bisect it, but it could take a while and I thought someone
> might have some theories based on the log:

If you have the WAL a pg_xlogdump grepping for everything referring to
that block would be helpful.

The only record which mentions block 28486 by name is this one:

rmgr: Gin         len (rec/tot):   1576/  1608, tx:   77882205, lsn: 11/30F4C2C0, prev 11/30F4C290, bkp: 0000, desc: Insert new list page, node: 1663/16384/16396 blkno: 28486

However, I think that that record precedes the recovery start point.

Cheers,

Jeff

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: PGDLLEXPORTing all GUCs?
Next
From: Andres Freund
Date:
Subject: Re: PGDLLEXPORTing all GUCs?