regression test failed when enabling checksum - Mailing list pgsql-hackers

From Jeff Janes
Subject regression test failed when enabling checksum
Date
Msg-id CAMkU=1zt3zd1FvDAyrGp3y=XU00Tp7nfNp69b1JS0BTAYNj13w@mail.gmail.com
Whole thread Raw
In response to Re: regression test failed when enabling checksum  (Jeff Janes <jeff.janes@gmail.com>)
Responses Re: regression test failed when enabling checksum
Re: regression test failed when enabling checksum
Re: regression test failed when enabling checksum
List pgsql-hackers
On Mon, Apr 1, 2013 at 10:37 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Tue, Mar 26, 2013 at 4:23 PM, Jeff Davis <pgsql@j-davis.com> wrote:

Patch attached. Only brief testing done, so I might have missed
something. I will look more closely later.

After applying your patch, I could run the stress test described here:


But altered to make use of initdb -k, of course.

Over 10,000 cycles of crash and recovery, I encountered two cases of checksum failures after recovery, example:
...
 
Unfortunately I already cleaned up the data directory before noticing the problem, so I have nothing to post for forensic analysis.  I'll try to reproduce the problem.


I've reproduced the problem, this time in block 74 of relation base/16384/4931589, and a tarball of the data directory is here:


(the table is in database jjanes under role jjanes, the binary is commit 9ad27c215362df436f8c)

What I would probably really want is the data as it existed after the crash but before recovery started, but since the postmaster immediately starts recovery after the crash, I don't know of a good way to capture this.

I guess one thing to do would be to extract from the WAL the most recent FPW for block 74 of relation base/16384/4931589  (assuming it hasn't been recycled already) and see if it matches what is actually in that block of that data file, but I don't currently know how to do that.

11500 SELECT 2013-04-01 12:01:56.926 PDT:WARNING:  page verification failed, calculated checksum 54570 but expected 34212
11500 SELECT 2013-04-01 12:01:56.926 PDT:ERROR:  invalid page in block 74 of relation base/16384/4931589
11500 SELECT 2013-04-01 12:01:56.926 PDT:STATEMENT:  select sum(count) from foo

Cheers,

Jeff

pgsql-hackers by date:

Previous
From: Chamila Wijayarathna
Date:
Subject: Re: Building postgresql project
Next
From: Jeff Janes
Date:
Subject: Re: regression test failed when enabling checksum