On Wed, 2013-04-03 at 15:57 -0700, Jeff Janes wrote:
> You don't know that the cluster is in the bad state until after it
> goes through recovery because most crashes recover perfectly fine. So
> it would have to make a side-copy of the cluster after the crash, then
> recover the original and see how things go, then either retain or
> delete the side-copy. Unfortunately my testing harness can't do this
> at the moment, because the perl script storing the consistency info
> needs to survive over the crash and recovery. It might take
> me awhile to figure out how to make it do this.
Hmm... you're right, that is difficult to integrate into a test.
Another approach is to expand PageHeaderIsValid in a pre-checksums
version to do some extra checks (perhaps the same checks as pg_filedump
does). It might be able to at least detect simple cases, like all zeros
between pd_upper and pd_special (when pd_upper < pd_special). Still not
easy, but maybe more practical than saving the directory like that. We
may even want a GUC for that to aid future debugging (obviously it would
have a performance impact).
> The other 3 files in it constitute the harness. It is a quite a mess,
> with some hard-coded paths. The just-posted fix for wal_keep_segments
> will also have to be applied.
I'm still trying to reproduce the problem myself. I keep trying simpler
versions of your test and I don't see the problem. It's a little awkward
to try to run the full version of your test (and I'm jumping between
code inspection and various other ideas).
> I'll work on it, but it will take awhile to figure out exactly how to
> use pg_filedump to do that, and how to automate that process and
> incorporate it into the harness.
It might be more productive to try to expand PageHeaderIsValid as I
suggested above.
> In the meantime, I tested the checksum commit itself (96ef3b8ff1c) and
> the problem occurs there, so if the problem is not the checksums
> themselves (and I agree it probably isn't) it must have been
> introduced before that rather than after.
Thank you for all of your work on this. I have also attached another
test patch that disables most of the complex areas of the checksums
patch (VM bits, hint bits, etc.). If you apply different portions of the
patch (or the whole thing), it will help eliminate possibilities. In
particular, eliminating the calls to PageSetAllVisible+visibilitymap_set
could tell us a lot.
Also, the first data directory you posted had a problem with a page that
appeared to be a new page (otherwise I would have expected either old or
new data at the end). Do you think this might be a problem only for new
pages? Or do you think your test just makes it likely that many writes
will happen on new pages?
I did find one potential problem in the checksums code (aside from my
previous patch involving wal_level=archive): visibilitymap_set is called
sometimes before the heap page is marked dirty. I will examine a little
more closely, but I expect that to require another patch. Not sure if
that could explain this problem or not.
Regards,
Jeff Davis