Re: Page Checksums - Mailing list pgsql-hackers

From Kevin Grittner
Subject Re: Page Checksums
Date
Msg-id 4EEF70E80200002500043E37@gw.wicourts.gov
Whole thread Raw
In response to Re: Page Checksums  (Greg Smith <greg@2ndQuadrant.com>)
Responses Re: Page Checksums  (Greg Smith <greg@2ndQuadrant.com>)
List pgsql-hackers
Greg Smith <greg@2ndQuadrant.com> wrote:
> But if you need all that infrastructure just to get the feature 
> launched, that's a bit hard to stomach.
Triggering a vacuum or some hypothetical "scrubbing" feature?
> Also, as someone who follows Murphy's Law as my chosen religion,
If you don't think I pay attention to Murphy's Law, I should recap
our backup procedures -- which involves three separate forms of
backup, each to multiple servers in different buildings, real-time,
plus idle-time comparison of the databases of origin to all replicas
with reporting of any discrepancies.  And off-line "snapshot"
backups on disk at a records center controlled by a different
department.  That's in addition to RAID redundancy and hardware
health and performance monitoring.  Some people think I border on
the paranoid on this issue.
> I would expect this situation could be exactly how flaky hardware
> would first manifest itself:  server crash and a bad CRC on the
> last thing written out.  And in that case, the last thing you want
> to do is assume things are fine, then kick off a VACUUM that might
> overwrite more good data with bad.
Are you arguing that autovacuum should be disabled after crash
recovery?  I guess if you are arguing that a database VACUUM might
destroy recoverable data when hardware starts to fail, I can't
argue.  And certainly there are way too many people who don't ensure
that they have a good backup before firing up PostgreSQL after a
failure, so I can see not making autovacuum more aggressive than
usual, and perhaps even disabling it until there is some sort of
confirmation (I have no idea how) that a backup has been made.  That
said, a database VACUUM would be one of my first steps after
ensuring that I had a copy of the data directory tree, personally.
I guess I could even live with that as recommended procedure rather
than something triggered through autovacuum and not feel that the
rest of my posts on this are too far off track.
> The main way I expect to validate this sort of thing is with an as
> yet unwritten function to grab information about a data block from
> a standby server for this purpose, something like this:
> 
> Master:  Computed CRC A, Stored CRC B; error raised because A!=B
> Standby:  Computed CRC C, Stored CRC D
> 
> If C==D && A==C, the corruption is probably overwritten bits of
> the CRC B.
Are you arguing we need *that* infrastructure to get the feature
launched?
-Kevin


pgsql-hackers by date:

Previous
From: Marti Raudsepp
Date:
Subject: [PATCH] Fix ScalarArrayOpExpr estimation for GIN indexes
Next
From: "David E. Wheeler"
Date:
Subject: Re: JSON for PG 9.2