On Mon, 2013-04-01 at 19:51 -0700, Jeff Janes wrote:
> I've reproduced the problem, this time in block 74 of relation
> base/16384/4931589, and a tarball of the data directory is here:
>
>
> https://docs.google.com/file/d/0Bzqrh1SO9FcELS1majlFcTZsR0k/edit?usp=sharing
>
>
>
> (the table is in database jjanes under role jjanes, the binary is
> commit 9ad27c215362df436f8c)
>
>
> What I would probably really want is the data as it existed after the
> crash but before recovery started, but since the postmaster
> immediately starts recovery after the crash, I don't know of a good
> way to capture this.
Can you just turn off the restart_after_crash GUC? I had a chance to
look at this, and seeing the block before and after recovery would be
nice. I didn't see a log file in the data directory, but it didn't go
through recovery, so I assume it already did that.
The block is corrupt as far as I can tell. The first third is written,
and the remainder is all zeros. The header looks like this:
(These numbers are mostly from pageinspect. The checksum value that
comes from pageinspect needed to be cast back to an unsigned short to
match the error message above -- should that be changed in
pageinspect?).
lsn: 7/252E4080
checksum: 34212
flags: 1
lower: 1188
upper: 5952
special: 8192
pagesize: 8192
version: 4
prune_xid: 156833911
So the header looks good, but most of the page data is missing. I tried
with pg_filedump (the 9.2.0 version, which should be fine), and it
agrees with me that the page is corrupt.
Interestingly, that doesn't result in a user-visible error when
ignore_checksum_failure=on. That's because the item pointers appear to
all be either not normal or they point to the zeroed region. Testing the
visibility of a zeroed tuple header is always false, so no problem.
You'd still think this would cause incorrect results, but I think what's
happening is that this is a new page (otherwise it would have been
written with something other than zeroes before). So, the tuples that
are supposed to be there may be uncommitted anyway.
So, the page may be corrupt without checksums as well, but it just
happens to be hidden for the same reason. Can you try to reproduce it
without -k? And on the checkin right before checksums were added?
Without checksums, you'll need to use pg_filedump (or similar) to find
whether an error has happened.
To start speculating about the root cause: something is violating the
WAL before data rule, or not writing a FPI when it's supposed to, or not
properly restoring the FPI during recovery, or something sets the wrong
LSN. This could still be caused by the checksums patch, but it seems a
little less likely now. The reason I say that is because it's a new page
with tuples on it, so that means something in the insert/update path
ended up not writing the FPI before writing the page.
Regards,Jeff Davis