Re: Checkpoint cost, looks like it is WAL/CRC - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Checkpoint cost, looks like it is WAL/CRC
Date
Msg-id 1121069858.3970.20.camel@localhost.localdomain
Whole thread Raw
In response to Re: Checkpoint cost, looks like it is WAL/CRC  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Fri, 2005-07-08 at 14:45 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > I don't think we should care too much about indexes. We can rebuild
> > them...but losing heap sectors means *data loss*.
> 
> If you're so concerned about *data loss* then none of this will be
> acceptable to you at all.  We are talking about going from a system
> that can actually survive torn-page cases to one that can only tell
> you whether you've lost data to such a case.  Arguing about the
> probability with which we can detect the loss seems beside the point.

In all of this, I see that turning off full page images would be an
option that defaults to "yes, take page images".

PITR was originally discussed (in 2002, see the archives) as a mechanism
that would allow full page images to be avoided. Since we now have PITR,
we can discuss more sensibly taking that option. If there are some
circumstances where we don't know the state of the server and need to
recover, that is OK, as long as we *can* recover. BUT only if we have a
fairly low chance of needing to use it. 

(Rebuilding an index is preferable to a full system recovery.)

So I am interested in the probability of us knowing whether the system
is damaged or not. It may then become an acceptable risk for a
production system to take in order to gain 50% performance. To that end,
I am willing to consider various heuristics that would allow us to
reduce the risk. I have suggested some, but am happy to hear others (or,
as you say, corrections to them) to make that idea more viable.

ISTM that Recovery could tell us:
1. Fully recovered, provably correct state of all data blocks
2. Fully recovered, unknown data correctness of some data blocks
3. Fully recovered, provably incorrect state of some data blocks

as well as:
a) no indexes require rebuilding
b) the following indexes require an immediate REINDEX...

Result 
1a requires no further action
1b requires some index rebuild after system becomes operational

Results 2 and 3 would require some form of system recovery

Since currently there are no tests performed to show correctness, we
won't ever know we're in state 1 and so would need to presume we are in
state 2 and recover.

My view is that if enough heuristics can be found to increase the
potential for ending a recovery in state 1 then turning off full page
images may become viable as a realistic cost/benefit. Though that is
never an option that I would suggest should be disabled by default.

Best Regards, Simon Riggs



pgsql-hackers by date:

Previous
From: Ferruccio Zamuner
Date:
Subject: fetch_search_path() and elog.c
Next
From: Simon Riggs
Date:
Subject: Re: Checkpoint cost, looks like it is WAL/CRC