Re: Hot standby, recovery infra - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Hot standby, recovery infra
Date
Msg-id 1233146554.2327.2343.camel@ebony.2ndQuadrant
Whole thread Raw
In response to Hot standby, recovery infra  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Responses Re: Hot standby, recovery infra  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
List pgsql-hackers
On Wed, 2009-01-28 at 12:04 +0200, Heikki Linnakangas wrote:
> I've been reviewing and massaging the so called recovery infra patch.

Thanks.

> I feel quite good about this patch now. Given the amount of code
> churn, it requires testing, and I'll read it through one more time
> after sleeping over it. 

There's nothing major I feel we should discuss.

The way restartpoints happen is a useful improvement, thanks.

> Simon, do you see anything wrong with this?

Few minor points

* I think we are now renaming the recovery.conf file too early. The
comment says "We have already restored all the WAL segments we need from
the archive, and we trust that they are not going to go away even if we
crash." We have, but the files overwrite each other as they arrive, so
if the last restartpoint is not in the last restored WAL file then it
will only exist in the archive. The recovery.conf is the only place
where we store the information on where the archive is and how to access
it, so by renaming it out of the way we will be unable to crash recover
until the first checkpoint is complete. So the way this was in the
original patch is the correct way to go, AFAICS.

* my original intention was to deprecate log_restartpoints and would
still like to do so. log_checkpoints does just as well for that. Even
less code than before...

* comment on BgWriterShmemInit() refers to CHECKPOINT_IS_STARTUP, but
the actual define is CHECKPOINT_STARTUP. Would prefer the "is" version
because it matches the IS_SHUTDOWN naming.

* In CreateCheckpoint() the if test on TruncateSubtrans() has been
removed, but the comment has not been changed (to explain why).

* PG_CONTROL_VERSION bump should be just one increment, to 844. I
deliberately had it higher to help spot mismatches earlier, and to avoid
needless patch conflicts.

So it looks pretty much ready for commit very soon.

We should continue to measure performance of recovery in the light of
these changes. I still feel that fsyncing the control file on each
XLogFileRead() will give a noticeable performance penalty, mostly
because we know doing exactly the same thing in normal running caused a
performance penalty. But that is easily changed and cannot be done with
any certainty without wider feedback, so no reason to delay code commit.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: pg_upgrade project status
Next
From: Gregory Stark
Date:
Subject: Re: posix_fadvise v22