Re: Replication slot drop message is sent after pgstats shutdown. - Mailing list pgsql-hackers
From | Noah Misch |
---|---|
Subject | Re: Replication slot drop message is sent after pgstats shutdown. |
Date | |
Msg-id | 20220319022629.GA2822749@rfd.leadboat.com Whole thread Raw |
In response to | Re: Replication slot drop message is sent after pgstats shutdown. (Andres Freund <andres@anarazel.de>) |
List | pgsql-hackers |
On Fri, Mar 18, 2022 at 01:24:15PM -0700, Andres Freund wrote: > On 2022-03-18 00:28:37 -0700, Noah Misch wrote: > > === > > diff -U3 /export/home/nm/farm/studio64v12_6/HEAD/pgsql/contrib/test_decoding/expected/slot_creation_error.out /export/home/nm/farm/studio64v12_6/HEAD/pgsql.build/contrib/test_decoding/output_iso/results/slot_creation_error.out > > --- /export/home/nm/farm/studio64v12_6/HEAD/pgsql/contrib/test_decoding/expected/slot_creation_error.out Tue Feb 1506:58:14 2022 > > +++ /export/home/nm/farm/studio64v12_6/HEAD/pgsql.build/contrib/test_decoding/output_iso/results/slot_creation_error.out TueFeb 15 11:38:14 2022 > > @@ -29,16 +29,17 @@ > > t > > (1 row) > > > > -step s2_init: <... completed> > > -ERROR: canceling statement due to user request > > step s1_view_slot: > > SELECT slot_name, slot_type, active FROM pg_replication_slots WHERE slot_name = 'slot_creation_error' > > > > -slot_name|slot_type|active > > ----------+---------+------ > > -(0 rows) > > +slot_name |slot_type|active > > +-------------------+---------+------ > > +slot_creation_error|logical |t > > +(1 row) > > > > step s1_c: COMMIT; > > +step s2_init: <... completed> > > +ERROR: canceling statement due to user request > > > > starting permutation: s1_b s1_xid s2_init s1_c s1_view_slot s1_drop_slot > > step s1_b: BEGIN; > > === > > > > I can make it fail that way by injecting a 1s delay here: > > > > --- a/src/backend/tcop/postgres.c > > +++ b/src/backend/tcop/postgres.c > > @@ -3339,6 +3339,7 @@ ProcessInterrupts(void) > > */ > > if (!DoingCommandRead) > > { > > + pg_usleep(1 * 1000 * 1000); > > LockErrorCleanup(); > > ereport(ERROR, > > (errcode(ERRCODE_QUERY_CANCELED), > > So isolationtester still sees the blocking condition from before the cancel > processing is finished and thus proceeds to the next statement - which it > normally should only do once the other running step is detected as still > blocking? Essentially. It called s1_view_slot too early. s2_init can remain blocked arbitrarily long after pg_cancel_backend returns. Writing s1_cancel_s2(s2_init) directs the isolationtester to send the cancel, then wait for s2_init to finish, then wait for the cancel to finish. > I wonder if we should emit <waiting> everytime a step is detected anew as > being blocked to make it easier to understand issues like this. Good question. > > diff --git a/contrib/test_decoding/specs/slot_creation_error.spec b/contrib/test_decoding/specs/slot_creation_error.spec > > index 6816696..d1e35bf 100644 > > --- a/contrib/test_decoding/specs/slot_creation_error.spec > > +++ b/contrib/test_decoding/specs/slot_creation_error.spec > > @@ -35,7 +35,7 @@ step s2_init { > > # The tests first start a transaction with an xid assigned in s1, then create > > # a slot in s2. The slot creation waits for s1's transaction to end. Instead > > # we cancel / terminate s2. > > -permutation s1_b s1_xid s2_init s1_view_slot s1_cancel_s2 s1_view_slot s1_c > > +permutation s1_b s1_xid s2_init s1_view_slot s1_cancel_s2(s2_init) s1_view_slot s1_c > > permutation s1_b s1_xid s2_init s1_c s1_view_slot s1_drop_slot # check slot creation still works > > -permutation s1_b s1_xid s2_init s1_terminate_s2 s1_c s1_view_slot > > +permutation s1_b s1_xid s2_init s1_terminate_s2(s2_init) s1_c s1_view_slot > > # can't run tests after this, due to s2's connection failure > > That looks good to me. Pushed. Kyotaro Horiguchi had posted a patch that also changed the pg_terminate_backend call in temp-schema-cleanup.spec. I think that one is fine as-is, because it does pg_terminate_backend(pg_backend_pid()). There's no way for a backend running that particular command to report that it's ready for another query, so the problem doesn't arise.
pgsql-hackers by date: