Re: Fix 035_standby_logical_decoding.pl race conditions - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Fix 035_standby_logical_decoding.pl race conditions |
Date | |
Msg-id | CAA4eK1Ku9MdGd1L74N=YKjSvB_9-2eQLsXz-LdPS1Cfnrs8TRw@mail.gmail.com Whole thread Raw |
In response to | Re: Fix 035_standby_logical_decoding.pl race conditions (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Fix 035_standby_logical_decoding.pl race conditions
|
List | pgsql-hackers |
On Thu, Apr 3, 2025 at 3:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Apr 3, 2025 at 12:29 PM Bertrand Drouvot > <bertranddrouvot.pg@gmail.com> wrote: > > > > On Thu, Apr 03, 2025 at 05:34:10AM +0000, Hayato Kuroda (Fujitsu) wrote: > > > Dear Bertrand, Amit, > > > > > > > > I do prefer v5-PG17-2 as it is "closer" to HEAD. That said, I think that we should > > > > > keep the slots active and only avoid doing the checks for them (they are > > > > invalidated > > > > > that's fine, they are not that's fine too). > > > > > > > > > > > > > I don't mind doing that, but there is no benefit in making slots > > > > active unless we can validate them. And we will end up adding some > > > > more checks, as in function check_slots_conflict_reason without any > > > > advantage. > > > > I think that there is advantage. The pros are: > > > > - the test would be closer to HEAD from a behavioural point of view > > - it's very rare to hit the corner cases: so the test would behave the same > > as on HEAD most of the time (and when it does not that would not hurt as the > > checks are nor done) > > - Kuroda-San's patch removes "or die "replication slot stats of vacuum_full_activeslot not updated" > > while keeping the slot active is able to keep it (should the slot being invalidated > > or not). But more on that in the comment === 1 below. > > > > > I feel Kuroda-San's second patch is simple, and we have > > > > fewer chances to make mistakes and easy to maintain in the future as > > > > well. > > > > Yeah maybe but the price to pay is to discard the pros above. That said, I'm also > > fine with Kuroda-San's patch if both of you feel that it's better. > > > > > I have concerns for Bertrand's patch that it could introduce another timing > > > issue. E.g., if the activated slots are not invalidated, dropping slots is keep > > > being activated so the dropping might be fail. > > > > Yeah, but the drop is done with "$node_standby->psql" so that the test does not > > produce an error. It would produce an error should we use "$node_standby->safe_psql" > > instead. > > > > > I did not reproduce this but > > > something like this can happen if we activate slots. > > > > You can see it that way (+ reproducer.txt): > > > > " > > + my $bdt = $node_standby->safe_psql('postgres', qq[SELECT * from pg_replication_slots]); > > + note "BDT: $bdt"; > > + > > $node_standby->psql('postgres', > > qq[SELECT pg_drop_replication_slot('$inactive_slot')]); > > " > > > > You'd see the slot being active and the "$node_standby->psql" not reporting > > any error. > > > > Hmm, but adding some additional smarts also makes this test less easy > to backpatch. I see your points related to the benefits, but I still > mildly prefer to go with the lesser changes approach for backbranches > patch. Normally, we don't enhance backbranches code without making > equivalent changes in HEAD, so adding some new bugs only in > backbranches has a lesser chance. > Bertrand, do you agree with the fewer changes approach (where active slots won't be tested) for backbranches? I think now that we have established that the vacuum full test is also prone to failure due to race condition in the test, this is the only remaining open point. -- With Regards, Amit Kapila.
pgsql-hackers by date: