Re: [BUG] non archived WAL removed during production crash recovery - Mailing list pgsql-bugs

From Andres Freund
Subject Re: [BUG] non archived WAL removed during production crash recovery
Date
Msg-id 20200424014856.f3jvvcxxmt5p7k5z@alap3.anarazel.de
Whole thread Raw
In response to Re: [BUG] non archived WAL removed during production crash recovery  (Michael Paquier <michael@paquier.xyz>)
Responses Re: [BUG] non archived WAL removed during production crash recovery
List pgsql-bugs
Hi,

On 2020-04-24 08:54:02 +0900, Michael Paquier wrote:
> On Thu, Apr 23, 2020 at 06:59:53PM +0200, Jehan-Guillaume de Rorthais wrote:
> > Please, find in v11 for version 9.5, 9.6 and 10.
>
> I have worked more on that using your v11, tweaked few comments in the
> new test parts for 9.5~10, and applied the whole.  Thanks all for your
> work.  I am keeping now an eye on the buildfarm.

I am confused by this commit. You added shared memory to differentiate
between crash recovery and standby mode/archive recovery, correct? You
write:

>    This commit fixes the regression by tracking in shared memory if a live
>    cluster is either in crash recovery or archive recovery as the handling
>    of WAL segments ready to be archived is different in both cases (those
>    WAL segments should not be removed during crash recovery), and by using
>    this new shared memory state to decide if a segment can be recycled or
>    not.

But don't we pretty much know this already from the state of the system?
During crash recovery there's nothing running RemoveOldXLogFiles() but
the startup process. Right? And in DB_IN_ARCHIVE_RECOVERY it should only
happen as part of restartpoints (i.e. the checkpointer).

Did you add the new shared state to avoid deducing things from the
"environment"? If so, it should really be mentioned in the commit
message & code. Because:

>    Previously, it was not possible to know if a cluster was in crash
>    recovery or archive recovery as the shared state was able to track only
>    if recovery was happening or not, leading to the problem.

really doesn't make that obvious.

Greetings,

Andres Freund



pgsql-bugs by date:

Previous
From: M Fysh
Date:
Subject: Re: BUG #16364: ICACLS error when installing under system context "NTAUTHORITY\SYSTEM" ie installing with SCCM
Next
From: Tom Lane
Date:
Subject: Re: [BUG] non archived WAL removed during production crash recovery