Re: [BUG] non archived WAL removed during production crash recovery - Mailing list pgsql-bugs

From Jehan-Guillaume de Rorthais
Subject Re: [BUG] non archived WAL removed during production crash recovery
Date
Msg-id 20200416234308.3e5f64bf@firost
Whole thread Raw
In response to Re: [BUG] non archived WAL removed during production crash recovery  (Michael Paquier <michael@paquier.xyz>)
Responses Re: [BUG] non archived WAL removed during production crash recovery
Re: [BUG] non archived WAL removed during production crash recovery
List pgsql-bugs
On Thu, 16 Apr 2020 17:11:00 +0900
Michael Paquier <michael@paquier.xyz> wrote:

> On Tue, Apr 14, 2020 at 06:03:19PM +0200, Jehan-Guillaume de Rorthais wrote:
> 
> Thanks for the new version.

Thank you for v6.

> > On Mon, 13 Apr 2020 16:14:14 +0900 Michael Paquier <michael@paquier.xyz>
> > wrote:  
> >> It seems to me that the initial value of SharedRecoveryState should be
> >> CRASH_RECOVERING all the time no?  StartupXLOG() is a code path taken
> >> even if starting cleanly, and the flag would be reset correctly at the
> >> end to NOT_IN_RECOVERY.  
> > 
> > Previous version of the patch was setting CRASH_RECOVERING. Fujii-san
> > reported (the 4 Apr 2020 02:49:50 +0900) that it could be useful to expose
> > a better value until relevant code is reached in StartupXLOG() so
> > GetRecoveryState() returns a safer value for futur use.
> >
> > As I answered upthread, I'm not sure who would need this information before
> > the WAL machinery is up though. Note that ARCHIVE_RECOVERING and
> > CRASH_RECOVERING are compatible with the previous behavior.  
> 
> I am not sure either if we need this information until the startup
> process is up, but even if we need it I'd rather keep the code
> consistent with the past practice, which was that
> SharedRecoveryInProgress got set to true, the equivalent of crash
> recovery as that's more generic than the archive recovery switch.

OK.

> > Maybe the solution would be to init with CRASH_RECOVERING and add some
> > comment in GetRecoveryState() to warn the state is "enforced" after the
> > XLOG machinery is started and is init'ed to RECOVERING in the meantime?
> > 
> > I initialized it to CRASH_RECOVERING in the new v5 patch and added a comment
> > to GetRecoveryState().  
> 
> Not sure that the comment is worth it.  Your description of the state
> looks enough, and the code is clear that we have just an initial
> shared memory state in this case.

OK. Will remove later after your review of the tests.

> >> It would be good to mention that while in crash recovery, files are
> >> not considered as deletable.  
> > 
> > Well, in fact, I am still wondering about this. I was hesitant to add a
> > shortcut like:
> > 
> >   /* no WAL segment cleanup during crash recovery */
> >   if (recoveryState == RECOVERT_STATE_CRASH)
> >       return false;
> > 
> > But, what if for example we crashed for lack of disk space during intensive
> > writes?  During crash recovery, any WAL marked as .done could be removed and
> > allow the system to start again and maybe make even further WAL cleanup by
> > archiving some more WAL without competing with previous high write ratio.
> >
> > When we recover a primary, this behavior seems conform with any value of
> > archive_mode, even if it has been changed after crash and before starting
> > it. On a standby, we might create .ready files, but they will be removed
> > during the first restartpoint if needed.  
> 
> I guess that you mean .ready and not .done here?

No, I meant .done.

> Removing .done files does not matter as the segments related to them are
> already gone.

Not necessarily. There's a time windows between the moment the archiver set
the .done file and when the checkpointer removes the associated WAL file.
So, after a PANIC because of lack of space, WAL associated with .done files but
not removed yet will be removed during the crash recovery.

> Even with that, why should we need to make the backend smarter about
> the removal of .ready files during crash recovery.  It seems to me
> that we should keep them, and an operator could always come by himself
> and do some manual cleanup to free some space in the pg_wal partition.

We are agree on this.

> > I needed a failure so I can test pg_stat_archiver reports it as well. In my
> > mind, using "false" would either trigger a failure because false returns 1
> > or... a failure because the command is not found. In either way, the result
> > is the same.
> > 
> > Using poll_query_until+pg_stat_file, is a good idea, but not enough as
> > archiver reports a failure some moment after the .ready signal file
> > appears.  
> 
> Using an empty string makes the test more portable, but while I looked
> at it I have found out that it leads to delays in the archiver except
> if you force the generation of more segments in the test, causing the
> logic to get more complicated with the manipulation of the .ready and
> .done files.  And I was then finding myself to add an
> archive_timeout..  Anyway, this reduced the readability of the test so
> I am pretty much giving up on this idea.

Unless I'm wrong, the empty string does not raise an error in pg_stat_archiver,
and I wanted to add a test on this as well.

> >> The end of the tests actually relies on the
> >> fact that archive_command is set to "false" when the cold backup is
> >> taken, before resetting it.  I think that it would be cleaner to
> >> enforce the configuration you want to test before starting each
> >> standby.  It becomes easier to understand the flow of the test for the
> >> reader.  
> > 
> > done as well.  
> 
> I have put my hands on the code, and attached is a cleaned up
> version for the backend part.  Below are some notes.
> 
> + * RECOVERT_STATE_ARCHIVE is set for archive recovery or for a
> standby.
> Typo here that actually comes from my previous email, and that you
> blindly copy-pasted, repeated five times in the tree actually.

Oops...yes, even in a comment with RECOVERT_STATE_NONE :/
Sorry.

> +   RecoveryState   recoveryState = GetRecoveryState();
> +
> +   /* The file is always deletable if archive_mode is "off". */
> +   if (!XLogArchivingActive())
> +       return true;
> There is no point to call GetRecoveryState() is archive_mode = off.

good catch!

> + * There's two different reasons to recover WAL: when standby mode is
> requested
> + * or after a crash to catchup with consistency.
> Nit: s/There's/There are/.  Anyway, I don't see much point in keeping
> this comment as the description of each state value should be enough,
> so I have removed it.

OK

> I am currently in the middle of reviewing the test and there are a
> couple of things that can be improved but I lack of time today, so
> I'll continue tomorrow on it.  There is no need to send two separate
> patches by the way as the code paths touched are different.

OK.

Thanks for your review! Let me know if you want me to add/change/fix some tests.

Regards,



pgsql-bugs by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: BUG #16363: Memory increases for each table accessed untilconnection is closed
Next
From: PG Bug reporting form
Date:
Subject: BUG #16370: pgadmin broken