Re: pgsql: Fix handling of WAL segments ready to be archived duringcrash r - Mailing list pgsql-committers

From Michael Paquier
Subject Re: pgsql: Fix handling of WAL segments ready to be archived duringcrash r
Date
Msg-id 20200424005929.GK33034@paquier.xyz
Whole thread Raw
In response to pgsql: Fix handling of WAL segments ready to be archived during crash r  (Michael Paquier <michael@paquier.xyz>)
Responses Re: pgsql: Fix handling of WAL segments ready to be archived duringcrash r  (Michael Paquier <michael@paquier.xyz>)
List pgsql-committers
On Thu, Apr 23, 2020 at 11:51:16PM +0000, Michael Paquier wrote:
> Fix handling of WAL segments ready to be archived during crash recovery
>
> 78ea8b5 has fixed an issue related to the recycling of WAL segments on
> standbys depending on archive_mode.  However, it has introduced a
> regression with the handling of WAL segments ready to be archived during
> crash recovery, causing those files to be recycled without getting
> archived.

And the buildfarm is reporting a couple of failures related to the
stability of the test:
- The first reports show that on REL9_5_STABLE and REL9_6_STABLE, the
first round has showed to be rather stable even on Windows, except for
three animals using gcc 6.3.
- In 11~, crake and piculet have been complaining.

I have not been able to see any failures in the runs I did across all
the branches, even on Windows.  But here, all failures are related to
the three following tests on standbys:
not ok 8 - .ready file for WAL segment 000000010000000000000001
present in backup got removed with archive_mode=on on standby
not ok 10 - .done file for WAL segment 000000010000000000000002
created when archive_mode=on on standby
not ok 12 - .ready file for WAL segment 000000010000000000000002
created with archive_mode=always on standby

And this visibly comes down to the fact that we don't take care enough
of the timing between the restartpoints done, the startup process
doing its recycling work and the archiver.  The rest of the test
relies on the reports of pg_stat_archiver a points to wait at as
published by the archiver process.  So there are two things we could
do here:
1) Just remove the unstable parts of the tests (the three ones above),
and keep coverage based on everything we have using pg_stat_archiver.
2) Remove the test entirely, though I would rather have us keep some
coverage, particularly for primaries as this got broken.

I'd rather do 2), any thoughts?
--
Michael

Attachment

pgsql-committers by date:

Previous
From: Michael Paquier
Date:
Subject: pgsql: Fix handling of WAL segments ready to be archived during crash r
Next
From: Michael Paquier
Date:
Subject: Re: pgsql: Fix handling of WAL segments ready to be archived duringcrash r