Re: 9.4 checksum errors in recovery with gin index - Mailing list pgsql-hackers

From Jeff Janes
Subject Re: 9.4 checksum errors in recovery with gin index
Date
Msg-id CAMkU=1y-sTDpkrzvDP3D8Wp96NcFYk4C-FGca5Yna4y6FAehiQ@mail.gmail.com
Whole thread Raw
In response to Re: 9.4 checksum errors in recovery with gin index  (Heikki Linnakangas <hlinnakangas@vmware.com>)
List pgsql-hackers
On Wed, May 7, 2014 at 1:40 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
On 05/07/2014 10:35 AM, Jeff Janes wrote:
When recovering from a crash (with injection of a partial page write at
time of crash) against 7c7b1f4ae5ea3b1b113682d4d I get a checksum
verification failure.

16396 is a gin index.

If I have it ignore checksum failures, there is no apparent misbehavior.
  I'm trying to bisect it, but it could take a while and I thought someone
might have some theories based on the log:

29075  2014-05-06 23:29:51.411 PDT:LOG:  00000: database system was not
properly shut down; automatic recovery in progress
29075  2014-05-06 23:29:51.411 PDT:LOCATION:  StartupXLOG, xlog.c:6361
29075  2014-05-06 23:29:51.412 PDT:LOG:  00000: redo starts at 11/323FE1C0
29075  2014-05-06 23:29:51.412 PDT:LOCATION:  StartupXLOG, xlog.c:6600
29075  2014-05-06 23:29:51.471 PDT:WARNING:  01000: page verification
failed, calculated checksum 35967 but expected 7881
29075  2014-05-06 23:29:51.471 PDT:CONTEXT:  xlog redo Delete list pages

A-ha. The WAL record of list page deletion doesn't create a full-page images of the deleted pages. That's pretty sensible, as a deleted page doesn't contain any data that needs to be retained. However, if a full-page image is not taken, then the page should be completely recreated from scratch at replay. It's not doing that.

Thanks for the testing! I'll have a stab at fixing that tomorrow. Basically, ginRedoDeleteListPages() needs to re-initialize the deleted pages.


It looks like it is solved now.

Thanks,

Jeff

pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: [COMMITTERS] pgsql: Clean up jsonb code.
Next
From: Guillaume Lelarge
Date:
Subject: Weird behaviour with the new MOVE clause of ALTER TABLESPACE