Thread: [HACKERS] Fast promotion not used when doing a recovery_target PITR restore?

[HACKERS] Fast promotion not used when doing a recovery_target PITR restore?

From
Andres Freund
Date:
Hi,

When doing a PITR style recovery, with recovery target set, we're
currently not doing a fast promotion, in contrast to the handling when
doing a pg_ctl or trigger file based promotion. That can prolong making
the server available for writes.

I can't really see a reason for this?

Greetings,

Andres Freund



Re: [HACKERS] Fast promotion not used when doing a recovery_targetPITR restore?

From
Michael Paquier
Date:
On Thu, Jun 22, 2017 at 3:04 AM, Andres Freund <andres@anarazel.de> wrote:
> When doing a PITR style recovery, with recovery target set, we're
> currently not doing a fast promotion, in contrast to the handling when
> doing a pg_ctl or trigger file based promotion. That can prolong making
> the server available for writes.
>
> I can't really see a reason for this?

Yes, you are right. I see no reason either why this cannot be done.
Why not just switching fast_promote to true in when using
RECOVERY_TARGET_ACTION_PROMOTE? That's a bug, not a critical one
though.
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment
On 2017-06-22 14:04:42 +0900, Michael Paquier wrote:
> On Thu, Jun 22, 2017 at 3:04 AM, Andres Freund <andres@anarazel.de> wrote:
> > When doing a PITR style recovery, with recovery target set, we're
> > currently not doing a fast promotion, in contrast to the handling when
> > doing a pg_ctl or trigger file based promotion. That can prolong making
> > the server available for writes.
> >
> > I can't really see a reason for this?
> 
> Yes, you are right. I see no reason either why this cannot be done.
> Why not just switching fast_promote to true in when using
> RECOVERY_TARGET_ACTION_PROMOTE? That's a bug, not a critical one
> though.

I don't think it's really a bug - just a missed optimization.  I'd
personally not be in favor of backpatching this - it'll have some chance
of screwing things up, even if I hope that chance is fairly small.

As a wider discussion, I wonder if we should keep non-fast promotion for
anything but actual crash recovery?  And even there it might actually be
a pretty good idea to not force a full checkpoint - getting up fast
after a crash is kinda important..

Andres Freund



Re: [HACKERS] Fast promotion not used when doing a recovery_targetPITR restore?

From
Michael Paquier
Date:
On Fri, Jun 23, 2017 at 2:34 AM, Andres Freund <andres@anarazel.de> wrote:
> I don't think it's really a bug - just a missed optimization.  I'd
> personally not be in favor of backpatching this - it'll have some chance
> of screwing things up, even if I hope that chance is fairly small.

It would be better to wait until the branch for PG11 opens then.

> As a wider discussion, I wonder if we should keep non-fast promotion for
> anything but actual crash recovery?

Yes, I would push a bit forward and remove fallback_promote.

> And even there it might actually be
> a pretty good idea to not force a full checkpoint - getting up fast
> after a crash is kinda important..

But not that. Crash recovery is designed to be simple and robust, with
only the postmaster and the startup processes running when doing so.
Not having the startup process doing by itself checkpoints would
require the need of the bgwriter, which increases the likelihood of
bugs. In short, I don't think that improving performance is the matter
for crash recovery, robustness and simplicity are.
-- 
Michael



On 2017-06-23 10:56:07 +0900, Michael Paquier wrote:
> > And even there it might actually be
> > a pretty good idea to not force a full checkpoint - getting up fast
> > after a crash is kinda important..
> 
> But not that. Crash recovery is designed to be simple and robust, with
> only the postmaster and the startup processes running when doing so.
> Not having the startup process doing by itself checkpoints would
> require the need of the bgwriter, which increases the likelihood of
> bugs. In short, I don't think that improving performance is the matter
> for crash recovery, robustness and simplicity are.

I'm far from convinced by this.  By now WAL replay with checkpointer,
bgwriter, etc. active is actually *more* tested than the cases without
it. The likelihood of bugs is higher in the less frequently exercised
paths, and given that replication exercises the situation with all those
processes active on a continuous basis, I'm fairly unconvinced by your
argument.

- Andres



Re: [HACKERS] Fast promotion not used when doing a recovery_targetPITR restore?

From
Michael Paquier
Date:
On Wed, Jun 28, 2017 at 3:44 AM, Andres Freund <andres@anarazel.de> wrote:
> I'm far from convinced by this.  By now WAL replay with checkpointer,
> bgwriter, etc. active is actually *more* tested than the cases without
> it. The likelihood of bugs is higher in the less frequently exercised
> paths, and given that replication exercises the situation with all those
> processes active on a continuous basis, I'm fairly unconvinced by your
> argument.

Crash recovery is the last thing where failures should never happen.
Don't you think that it should remain simple as it has been designed
originally? It seems to me that the argument for keeping things simple
has higher priority than performance in being able to reconnect by
delaying the checkpoint.
-- 
Michael



On 2017-06-28 06:04:23 +0900, Michael Paquier wrote:
> On Wed, Jun 28, 2017 at 3:44 AM, Andres Freund <andres@anarazel.de> wrote:
> > I'm far from convinced by this.  By now WAL replay with checkpointer,
> > bgwriter, etc. active is actually *more* tested than the cases without
> > it. The likelihood of bugs is higher in the less frequently exercised
> > paths, and given that replication exercises the situation with all those
> > processes active on a continuous basis, I'm fairly unconvinced by your
> > argument.
> 
> Crash recovery is the last thing where failures should never happen.
> Don't you think that it should remain simple as it has been designed
> originally? It seems to me that the argument for keeping things simple
> has higher priority than performance in being able to reconnect by
> delaying the checkpoint.

You seem to completely argue besides my point that the replication path
is *more* robust by now?  And there's plenty scenarios where a faster
startup is quite crucial for performance. The difference between an
immediate shutdown + recovery without checkpoint to a fast shutdown can
be very large, and that matters a lot for faster postgres updates etc.

Andres



Re: [HACKERS] Fast promotion not used when doing a recovery_targetPITR restore?

From
Michael Paquier
Date:
On Wed, Jun 28, 2017 at 6:13 AM, Andres Freund <andres@anarazel.de> wrote:
> You seem to completely argue besides my point that the replication path
> is *more* robust by now?  And there's plenty scenarios where a faster
> startup is quite crucial for performance. The difference between an
> immediate shutdown + recovery without checkpoint to a fast shutdown can
> be very large, and that matters a lot for faster postgres updates etc.

If you go that way, it seems safer to me if users had some control
with a switch, defaulting to the previous behavior. And a complete
switch to the newer behavior could be done later on depending on what
has been found.
-- 
Michael