Home > mailing lists

Re: failures in t/031_recovery_conflict.pl on CI - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: failures in t/031_recovery_conflict.pl on CI
Date	April 13, 2022 03:26:26
Msg-id	20220413002626.udl7lll7f3o7nre7@alap3.anarazel.de Whole thread Raw
In response to	Re: failures in t/031_recovery_conflict.pl on CI (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: failures in t/031_recovery_conflict.pl on CI
List	pgsql-hackers

Tree view

Hi,

On 2022-04-12 15:05:22 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2022-04-09 19:34:26 -0400, Tom Lane wrote:
> >> +1.  This is probably more feasible given the latch infrastructure
> >> than it was when that code was first written.
> 
> > What do you think about just reordering the disable_all_timeouts() to be
> > before the got_standby_deadlock_timeout check in the back branches? I think
> > that should close at least the most obvious hole.  And fix it properly in
> > HEAD?
> 
> I don't have much faith in that, and I don't see why we can't fix it
> properly.

It's not too hard, agreed.

It's somewhat hard to test that it works though. The new recovery conflict
tests test all recovery conflicts except for deadlock ones, because they're
not easy to trigger... But I think I nearly got it working reliably.

It's probably worth backpatching the tests, after stripping them of the stats
checks?

Three questions:

- For HEAD we have to replace the disable_all_timeouts() calls, it breaks the
  replay progress reporting. Is there a reason to keep them in the
  backbranches? Hard to see how an extension or something could rely on it,
  but ...?

- I named the variable set by the signal handler got_standby_delay_timeout,
  because just got_standby_timeout looked weird besides the _deadlock_. Which
  makes me think that we should rename STANDBY_TIMEOUT to
  STANDBY_DELAY_TIMEOUT too?

- There's the following comment in ResolveRecoveryConflictWithBufferPin():

  "We assume that only UnpinBuffer() and the timeout requests established
   above can wake us up here."

  That bogus afaict? There's plenty other things that can cause MyProc->latch
  to be set. Is it worth doing something about this at the same time? Right
  now we seem to call ResolveRecoveryConflictWithBufferPin() in rapid
  succession initially.

Greetings,

Andres Freund

pgsql-hackers by date:

From: Peter Geoghegan
Date: 13 April 2022, 03:22:04
Subject: Re: Improving the "Routine Vacuuming" docs

From: "David G. Johnston"
Date: 13 April 2022, 05:20:59
Subject: Re: Improving the "Routine Vacuuming" docs

Re: failures in t/031_recovery_conflict.pl on CI - Mailing list pgsql-hackers

Previous

Next