On Fri, 2008-09-26 at 11:20 +0100, Simon Riggs wrote:
> > After reading this for awhile, I realized that there is a rather
> > fundamental problem with it: it switches into "consistent recovery"
> > mode as soon as it's read WAL beyond ControlFile->minRecoveryPoint.
> > In a crash recovery situation that typically is before the last
> > checkpoint (if indeed it's not still zero), and what that means is
> > that this patch will activate the bgwriter and start letting in
> > backends instantaneously after a crash, long before we can have any
> > certainty that the DB state really is consistent.
> >
> > In a normal crash recovery situation this would be easily fixed by
> > simply not letting it go to "consistent recovery" state at all, but
> > what about recovery from a restartpoint? We don't want a slave that's
> > crashed once to never let backends in again. But I don't see how to
> > determine that we're far enough past the restartpoint to be consistent
> > again. In crash recovery we assume (without proof ;-)) that we're
> > consistent once we reach the end of valid-looking WAL, but that rule
> > doesn't help for a slave that's following a continuing WAL sequence.
> >
> > Perhaps something could be done based on noting when we have to pull in
> > a WAL segment from the recovery_command, but it sounds like a pretty
> > fragile assumption.
>
> Seems like we just say we only signal the postmaster if
> InArchiveRecovery. Archive recovery from a restartpoint is still archive
> recovery, so this shouldn't be a problem in the way you mention. The
> presence of recovery.conf overrides all other cases.
Anticipating your possible reponses, I would add this also:
There has long been an annoying hole in the PITR scheme which is the
support of recovery using a crashed database. That is there to support
split mirror snapshots, but it creates a loophole where we don't know
the min recovery location, circumventing the care we (you!) took to put
stop/start backup in place.
I think we need to add a parameter to recovery.conf that people can use
to specify a minRecoveryPoint iff there in no backup label file. They
can work out what this should be by following this procedure, which we
should document:
* split mirror, so you have offline copy of crashed database
* copy database away to backup
* go to running database and run pg_current_xlog_insert_location()
* use the value to specify recovery_min_location
If they don't specify this, then bgwriter will not start and you cannot
run in Hot Standby mode. Their choice, so we need not worry then about
the loophole any more.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support