Re: Race conditions in 019_replslot_limit.pl - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Race conditions in 019_replslot_limit.pl
Date
Msg-id 3566584.1645067490@sss.pgh.pa.us
Whole thread Raw
In response to Re: Race conditions in 019_replslot_limit.pl  (Andres Freund <andres@anarazel.de>)
Responses Re: Race conditions in 019_replslot_limit.pl  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Andres Freund <andres@anarazel.de> writes:
> On 2022-02-16 20:22:23 -0500, Tom Lane wrote:
>> There's no disconnection log entry for either, which I suppose means
>> that somebody didn't bother logging disconnection for walsenders ...

> The thing is, we actually *do* log disconnection for walsenders:

Ah, my mistake, now I do see a disconnection entry for the other walsender
launched by the basebackup.

> Starting a node in recovery and having it connect to the primary seems like a
> mighty long time for a process to exit, unless it's stuck behind something.

Fair point.  Also, 019_replslot_limit.pl hasn't been changed in any
material way in months, but *something's* changed recently, because
this just started.  I scraped the buildfarm for instances of
"Failed test 'have walsender pid" going back 6 months, and what I find is

   sysname    | branch |      snapshot       |     stage     |                      l
--------------+--------+---------------------+---------------+---------------------------------------------
 desmoxytes   | HEAD   | 2022-02-15 04:42:05 | recoveryCheck | #   Failed test 'have walsender pid 1685516
 idiacanthus  | HEAD   | 2022-02-15 07:24:05 | recoveryCheck | #   Failed test 'have walsender pid 2758549
 serinus      | HEAD   | 2022-02-15 11:00:08 | recoveryCheck | #   Failed test 'have walsender pid 3682154
 desmoxytes   | HEAD   | 2022-02-15 11:04:05 | recoveryCheck | #   Failed test 'have walsender pid 3775359
 flaviventris | HEAD   | 2022-02-15 18:03:48 | recoveryCheck | #   Failed test 'have walsender pid 1517077
 idiacanthus  | HEAD   | 2022-02-15 22:48:05 | recoveryCheck | #   Failed test 'have walsender pid 2494972
 desmoxytes   | HEAD   | 2022-02-15 23:48:04 | recoveryCheck | #   Failed test 'have walsender pid 3055399
 desmoxytes   | HEAD   | 2022-02-16 10:48:05 | recoveryCheck | #   Failed test 'have walsender pid 1593461
 komodoensis  | HEAD   | 2022-02-16 21:16:04 | recoveryCheck | #   Failed test 'have walsender pid 3726703
 serinus      | HEAD   | 2022-02-17 01:18:17 | recoveryCheck | #   Failed test 'have walsender pid 208363

So (a) it broke around 48 hours ago, which is already a useful
bit of info, and (b) your animals seem far more susceptible than
anyone else's.  Why do you suppose that is?

            regards, tom lane



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Nonrandom scanned_pages distorts pg_class.reltuples set by VACUUM
Next
From: Tom Lane
Date:
Subject: Re: Race conditions in 019_replslot_limit.pl