Re: [BUG] non archived WAL removed during production crash recovery - Mailing list pgsql-bugs

From Michael Paquier
Subject Re: [BUG] non archived WAL removed during production crash recovery
Date
Msg-id 20200424034248.GL33034@paquier.xyz
Whole thread Raw
In response to Re: [BUG] non archived WAL removed during production crash recovery  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [BUG] non archived WAL removed during production crash recovery
List pgsql-bugs
On Thu, Apr 23, 2020 at 10:21:15PM -0400, Tom Lane wrote:
> Looks like the news is not good :-(

Yes, I was looking at that for the last couple of hours, and just
pushed something to put back the buildfarm to a green state for now
(based on the first results things seem stable now) by removing the
defective subset of tests.

> I see that my own florican is one of the failing critters, though
> it failed only on HEAD which seems odd.  Any suggestions what to
> look for?

The issue comes from the parts of the test where we expect some .ready
files to exist (or not) after triggering a restartpoint to force some
segments to be recycled.  And looking more at it, I suspect that the
issue is actually that we don't make sure in the test that the
standbys started have replayed up to the segment switch record
triggered on the primary (the one within generate_series(10,20)), and
then the follow-up restart point does not actually recycle the
segments we expect to recycle.  That's more likely going to be a
problem on slower machines as the window gets wider between the moment
the standbys reach their consistency point and the moment the switch
record is replayed.
--
Michael

Attachment

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: [BUG] non archived WAL removed during production crash recovery
Next
From: David Rowley
Date:
Subject: Re: BUG #15383: Join Filter cost estimation problem in 10.5