On Mon, Feb 1, 2021 at 11:23 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Mon, Feb 1, 2021 at 3:44 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Feb 1, 2021 at 9:38 AM Peter Smith <smithpb2250@gmail.com> wrote:
> > >
> > > > I think this is true only when the user specifically requested it by
> > > > the use of "ALTER SUBSCRIPTION ... SET (slot_name = NONE)", right?
> > > > Otherwise, we give an error on a broken connection. Also, if that is
> > > > true then is there a reason to pass missing_ok as true while dropping
> > > > tablesync slots?
> > > >
> > >
> > > AFAIK there is always a potential race with DropSubscription dropping
> > > slots. The DropSubscription might be running at exactly the same time
> > > the apply worker has just dropped the very same tablesync slot.
> > >
> >
> > We stopped the workers before getting a list of NotReady relations and
> > then we try to drop the corresponding slots. So, how such a race
> > condition can happen? Note, because we have a lock on pg_subscrition,
> > there is no chance that the workers can restart till the transaction
> > end.
>
> OK. I think I was forgetting the logicalrep_worker_stop would also go
> into a loop waiting for the worker process to die. So even if the
> tablesync worker does simultaneously drop it's own slot, I think it
> will certainly at least be in SYNCDONE state before DropSubscription
> does anything else with that worker.
>
How is that ensured? We don't have anything like HOLD_INTERRUPTS
between the time dropped the slot and updated rel state as SYNCDONE.
So, isn't it possible that after we dropped the slot and before we
update the state, the SIGTERM signal arrives and led to worker exit?
-- 
With Regards,
Amit Kapila.