Thread: [HACKERS] archive_timeout ignored directly after promotion

[HACKERS] archive_timeout ignored directly after promotion

From

Andres Freund

Date:

21 June 2017, 20:58:38

Hi,

When promoting a standby without using the "fast promotion" logic,
e.g. by using recovery targets (which doesn't use fast promotion for
unbeknownst to me reasons), checkpointer doesn't necessarily respect
archive timeout for the first cycle after promotion.  The reason for
that is that it determines the sleep times like    if (XLogArchiveTimeout > 0 && !RecoveryInProgress())    {
elapsed_secs= now - last_xlog_switch_time;        if (elapsed_secs >= XLogArchiveTimeout)            continue;
/*no sleep for us ... */        cur_timeout = Min(cur_timeout, XLogArchiveTimeout - elapsed_secs);    }
 
and RecoveryInProgress() will still return true.  We just fudge that
when creating the end-of-recovery checkpoint:        /*         * The end-of-recovery checkpoint is a real checkpoint
that's        * performed while we're still in recovery.         */        if (flags & CHECKPOINT_END_OF_RECOVERY)
     do_restartpoint = false;
 

One easy way to fix that would be to just wakeup the checkpointer from
the startup process once at the end of recovery, but it'd not be
pretty.   I think it'd be better to change the        do_restartpoint = RecoveryInProgress();
        /*         * The end-of-recovery checkpoint is a real checkpoint that's         * performed while we're still
inrecovery.         */        if (flags & CHECKPOINT_END_OF_RECOVERY)            do_restartpoint = false;
 

into having a per-loop 'local_in_recovery' variable, that we turn off
once a CHECKPOINT_END_OF_RECOVERY checkpoint is requested.

Comments?

Greetings,

Andres Freund

Re: [HACKERS] archive_timeout ignored directly after promotion

From

Michael Paquier

Date:

22 June 2017, 08:55:30

On Thu, Jun 22, 2017 at 2:58 AM, Andres Freund <andres@anarazel.de> wrote:
> One easy way to fix that would be to just wakeup the checkpointer from
> the startup process once at the end of recovery, but it'd not be
> pretty.   I think it'd be better to change the
>                         do_restartpoint = RecoveryInProgress();
>
>                         /*
>                          * The end-of-recovery checkpoint is a real checkpoint that's
>                          * performed while we're still in recovery.
>                          */
>                         if (flags & CHECKPOINT_END_OF_RECOVERY)
>                                 do_restartpoint = false;
>
> into having a per-loop 'local_in_recovery' variable, that we turn off
> once a CHECKPOINT_END_OF_RECOVERY checkpoint is requested.
>
> Comments?

By Initializing this flag out of the for(;;) loop once, this could put
more control into the checkpointer logic so as no new restart points
can be generated after the end-of-recovery checkpoint. This way we
could replace the hack looking for DB_IN_ARCHIVE_RECOVERY in
CreateRestartPoint() by an error handling.
-- 
Michael