Re: Hot standby, recovery infra - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Hot standby, recovery infra |
Date | |
Msg-id | 1233771931.4500.325.camel@ebony.2ndQuadrant Whole thread Raw |
In response to | Re: Hot standby, recovery infra (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>) |
Responses |
Re: Hot standby, recovery infra
|
List | pgsql-hackers |
On Wed, 2009-02-04 at 19:03 +0200, Heikki Linnakangas wrote: > Simon Riggs wrote: > > * I think we are now renaming the recovery.conf file too early. The > > comment says "We have already restored all the WAL segments we need from > > the archive, and we trust that they are not going to go away even if we > > crash." We have, but the files overwrite each other as they arrive, so > > if the last restartpoint is not in the last restored WAL file then it > > will only exist in the archive. The recovery.conf is the only place > > where we store the information on where the archive is and how to access > > it, so by renaming it out of the way we will be unable to crash recover > > until the first checkpoint is complete. So the way this was in the > > original patch is the correct way to go, AFAICS. > > I can see what you mean now. In fact we're not safe even when the last > restartpoint is in the last restored WAL file, because we always restore > segments from the archive to a temporary filename. > > Yes, renaming recovery.conf at the first checkpoint avoids that problem. > However, it means that we'll re-enter archive recovery if we crash > before that checkpoint is finished. Won't that cause havoc if more files > have appeared to the archive since the crash, and we restore those even > though we already started up from an earlier point? How do the timelines > work in that case? More archive files being added to archive is possible, but unlikely. What *is* a definite problem is that if we ended recovery by manual command (possible with HS patch) or recovery target then we would have no record of which point it was that we finished at. It is then possible that the recovery.conf has been re-edited, causing re-recovery to end at a different place. That seems like a bad thing. > We could avoid that by performing a good old startup checkpoint, but I > quite like the fast failover time we get without it. ISTM it's either slow failover or (fast failover, but restart archive recovery if crashes). I would suggest that at end of recovery we write the last LSN to the control file, so if we crash recover then we will always end archive recovery at the same place again should we re-enter it. So we would have a recovery_target_lsn that overrides recovery_target_xid etc.. Given where we are, I would suggest we go for the slow failover option in this release. Doing otherwise is a risky decision with little gain. BGwriter-in-recovery is a good thing of itself and we have other code to review yet with a higher importance level, and the rest of HS is code I'm actually much happier with. I've spent weeks trying to get the transition between recovery and normal running clean, but it seems like time to say I didn't get it right *enough* to be absolutely certain. (Just the fast failover part of patch!). Thanks for the review. > > We should continue to measure performance of recovery in the light > of > > these changes. I still feel that fsyncing the control file on each > > XLogFileRead() will give a noticeable performance penalty, mostly > > because we know doing exactly the same thing in normal running > caused a > > performance penalty. But that is easily changed and cannot be done > with > > any certainty without wider feedback, so no reason to delay code > commit. > > I've changed the way minRecoveryPoint is updated now anyway, so it no > longer happens every XLogFileRead(). Care to elucidate? -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
pgsql-hackers by date: