Re: Enabling Checksums - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: Enabling Checksums
Date
Msg-id 1353350145.10198.130.camel@jdavis-laptop
Whole thread Raw
In response to Re: Enabling Checksums  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: Enabling Checksums  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
On Mon, 2012-11-19 at 18:30 +0100, Andres Freund wrote:
> Yes, definitely.

OK. I suppose that makes sense for large writes.

> > If that is not true, then I'm concerned about replicating corruption, or
> > backing up corrupt blocks over good ones. How do we prevent that? It
> > seems like a pretty major hole if we can't, because it means the only
> > safe replication is streaming replication; a base-backup is essentially
> > unsafe. And it means that even an online background checking utility
> > would be quite hard to do properly.
> 
> I am not sure I see the danger in the base backup case here? Why would
> we have corrupted backup blocks? While postgres is running we won't see
> such torn pages because its all done under proper locks...

Yes, the blocks written *after* the checkpoint might have a bad checksum
that will be fixed during recovery. But the blocks written *before* the
checkpoint should have a valid checksum, but if they don't, then
recovery doesn't know about them.

So, we can't verify the checksums in the base backup because it's
expected that some blocks will fail the check, and they can be fixed
during recovery. That gives us no protection for blocks that were truly
corrupted and written long before the last checkpoint.

I suppose if we could somehow differentiate the blocks, that might work.
Maybe look at the LSN and only validate blocks written before the
checkpoint? But of course, that's a problem because a corrupt block
might have the wrong LSN (in fact, it's likely, because garbage is more
likely to make the LSN too high than too low).

Regards,Jeff Davis




pgsql-hackers by date:

Previous
From: Atri Sharma
Date:
Subject: Re: Do we need so many hint bits?
Next
From: Tomas Vondra
Date:
Subject: Re: too much pgbench init output