Re: Fix 035_standby_logical_decoding.pl race conditions - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Fix 035_standby_logical_decoding.pl race conditions
Date
Msg-id CAA4eK1Ko9eGgEBMXpBdHwu5MgV1PwBZkXDJ_dajoRGTZiNFYAg@mail.gmail.com
Whole thread Raw
In response to Re: Fix 035_standby_logical_decoding.pl race conditions  (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>)
List pgsql-hackers
On Thu, Apr 3, 2025 at 12:29 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> On Thu, Apr 03, 2025 at 05:34:10AM +0000, Hayato Kuroda (Fujitsu) wrote:
> > Dear Bertrand, Amit,
> >
> > > > I do prefer v5-PG17-2 as it is "closer" to HEAD. That said, I think that we should
> > > > keep the slots active and only avoid doing the checks for them (they are
> > > invalidated
> > > > that's fine, they are not that's fine too).
> > > >
> > >
> > > I don't mind doing that, but there is no benefit in making slots
> > > active unless we can validate them. And we will end up adding some
> > > more checks, as in function check_slots_conflict_reason without any
> > > advantage.
>
> I think that there is advantage. The pros are:
>
> - the test would be closer to HEAD from a behavioural point of view
> - it's very rare to hit the corner cases: so the test would behave the same
> as on HEAD most of the time (and when it does not that would not hurt as the
> checks are nor done)
> - Kuroda-San's patch removes "or die "replication slot stats of vacuum_full_activeslot not updated"
> while keeping the slot active is able to keep it (should the slot being invalidated
> or not). But more on that in the comment === 1 below.
>
> > I feel Kuroda-San's second patch is simple, and we have
> > > fewer chances to make mistakes and easy to maintain in the future as
> > > well.
>
> Yeah maybe but the price to pay is to discard the pros above. That said, I'm also
> fine with Kuroda-San's patch if both of you feel that it's better.
>
> > I have concerns for Bertrand's patch that it could introduce another timing
> > issue. E.g., if the activated slots are not invalidated, dropping slots is keep
> > being activated so the dropping might be fail.
>
> Yeah, but the drop is done with "$node_standby->psql" so that the test does not
> produce an error. It would produce an error should we use "$node_standby->safe_psql"
> instead.
>
> > I did not reproduce this but
> > something like this can happen if we activate slots.
>
> You can see it that way (+ reproducer.txt):
>
> "
> +       my $bdt = $node_standby->safe_psql('postgres', qq[SELECT * from pg_replication_slots]);
> +       note "BDT: $bdt";
> +
>         $node_standby->psql('postgres',
>                 qq[SELECT pg_drop_replication_slot('$inactive_slot')]);
> "
>
> You'd see the slot being active and the "$node_standby->psql" not reporting
> any error.
>

Hmm, but adding some additional smarts also makes this test less easy
to backpatch. I see your points related to the benefits, but I still
mildly prefer to go with the lesser changes approach for backbranches
patch. Normally, we don't enhance backbranches code without making
equivalent changes in HEAD, so adding some new bugs only in
backbranches has a lesser chance.

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: Some codes refer slot()->{'slot_name'} but it is not defined
Next
From: Amit Kapila
Date:
Subject: Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.