Re: [BUG] non archived WAL removed during production crash recovery - Mailing list pgsql-bugs

From Jehan-Guillaume de Rorthais
Subject Re: [BUG] non archived WAL removed during production crash recovery
Date
Msg-id 20200401181735.11100908@firost
Whole thread Raw
In response to Re: [BUG] non archived WAL removed during production crash recovery  (Fujii Masao <masao.fujii@oss.nttdata.com>)
Responses Re: [BUG] non archived WAL removed during production crash recovery
List pgsql-bugs
On Wed, 1 Apr 2020 17:27:22 +0900
Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
[...]
> >    bool inRecovery = RecoveryInProgress();
> >    
> >    /*
> >     * The file is always deletable if archive_mode is "off".  On standbys
> >     * archiving is disabled if archive_mode is "on", and enabled with
> >     * "always".  On a primary, archiving is enabled if archive_mode is "on"
> >     * or "always".
> >     */
> >    if (!((XLogArchivingActive() && !inRecovery) ||
> >          (XLogArchivingAlways() && inRecovery)))
> >        return true;
> > 
> > Please find in attachment a patch that fix this issue using the following
> > test instead:
> > 
> >    if (!((XLogArchivingActive() && !StandbyModeRequested) ||
> >          (XLogArchivingAlways() && inRecovery)))
> >        return true;
> > 
> > I'm not sure if we should rely on StandbyModeRequested for the second part
> > of the test as well thought. What was the point to rely on
> > RecoveryInProgress() to get the recovery status from shared mem?  
> 
> Since StandbyModeRequested is the startup process-local variable,
> it basically cannot be used in XLogArchiveCheckDone() that other process
> may call.

Ok, you answered my wondering about using recovery status from shared mem. This
was obvious. Thanks for your help!

I was wondering if we could use "ControlFile->state != DB_IN_CRASH_RECOVERY".
It seems fine during crash recovery as the control file is updated before the
checkpoint, but it doesn't feel right for other code paths where the control
file might not be up-to-date on filesystem, right ?

> So another approach would be necessary... One straight idea is to
> add new shmem flag indicating whether we are in standby mode or not.

I was thinking about setting XLogCtlData->SharedRecoveryInProgress as an enum
using:

  enum RecoveryState
  {
    NOT_IN_RECOVERY = 0
    IN_CRASH_RECOVERY,
    IN_ARCHIVE_RECOVERY
  }

Please, find in attachment a patch implementing this.

Plus, I added a second commit to add one test in regard with this bug.

> Another is to make the startup process remove .ready file if necessary.

I'm not sure to understand this one.

Regards,

Attachment

pgsql-bugs by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: BUG #16332: The xmltable function returns 'comment()' as unusablexml appended together and having no xml tags
Next
From: Julien Rouhaud
Date:
Subject: Re: BUG #16331: segfault in checkpointer with full disk