Re: Hot standby, recovery infra - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Hot standby, recovery infra
Date
Msg-id 4989CA75.4030206@enterprisedb.com
Whole thread Raw
In response to Re: Hot standby, recovery infra  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: Hot standby, recovery infra  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
Simon Riggs wrote:
> * I think we are now renaming the recovery.conf file too early. The
> comment says "We have already restored all the WAL segments we need from
> the archive, and we trust that they are not going to go away even if we
> crash." We have, but the files overwrite each other as they arrive, so
> if the last restartpoint is not in the last restored WAL file then it
> will only exist in the archive. The recovery.conf is the only place
> where we store the information on where the archive is and how to access
> it, so by renaming it out of the way we will be unable to crash recover
> until the first checkpoint is complete. So the way this was in the
> original patch is the correct way to go, AFAICS.

I can see what you mean now. In fact we're not safe even when the last 
restartpoint is in the last restored WAL file, because we always restore 
segments from the archive to a temporary filename.

Yes, renaming recovery.conf at the first checkpoint avoids that problem. 
However, it means that we'll re-enter archive recovery if we crash 
before that checkpoint is finished. Won't that cause havoc if more files 
have appeared to the archive since the crash, and we restore those even 
though we already started up from an earlier point? How do the timelines 
work in that case?

We could avoid that by performing a good old startup checkpoint, but I 
quite like the fast failover time we get without it.

> * my original intention was to deprecate log_restartpoints and would
> still like to do so. log_checkpoints does just as well for that. Even
> less code than before...

Ok, done.

> * comment on BgWriterShmemInit() refers to CHECKPOINT_IS_STARTUP, but
> the actual define is CHECKPOINT_STARTUP. Would prefer the "is" version
> because it matches the IS_SHUTDOWN naming.

Fixed.

> * In CreateCheckpoint() the if test on TruncateSubtrans() has been
> removed, but the comment has not been changed (to explain why).

Thanks, comment updated. (we now call CreateCheckPoint() only after 
recovery is finished)

> We should continue to measure performance of recovery in the light of
> these changes. I still feel that fsyncing the control file on each
> XLogFileRead() will give a noticeable performance penalty, mostly
> because we know doing exactly the same thing in normal running caused a
> performance penalty. But that is easily changed and cannot be done with
> any certainty without wider feedback, so no reason to delay code commit.

I've changed the way minRecoveryPoint is updated now anyway, so it no 
longer happens every XLogFileRead().

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Teodor Sigaev
Date:
Subject: Re: [PATCHES] GIN improvements
Next
From: Tom Lane
Date:
Subject: Re: add_path optimization