Re: Single transaction in the tablesync worker? - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Single transaction in the tablesync worker?
Date
Msg-id CAA4eK1+C4kiFAb-bYpBgcZ7VSaZihhhJvKoDYB8JOhfqZAnnHQ@mail.gmail.com
Whole thread Raw
In response to Re: Single transaction in the tablesync worker?  (Peter Smith <smithpb2250@gmail.com>)
Responses Re: Single transaction in the tablesync worker?  (Peter Smith <smithpb2250@gmail.com>)
List pgsql-hackers
On Mon, Feb 1, 2021 at 11:23 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Mon, Feb 1, 2021 at 3:44 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Feb 1, 2021 at 9:38 AM Peter Smith <smithpb2250@gmail.com> wrote:
> > >
> > > > I think this is true only when the user specifically requested it by
> > > > the use of "ALTER SUBSCRIPTION ... SET (slot_name = NONE)", right?
> > > > Otherwise, we give an error on a broken connection. Also, if that is
> > > > true then is there a reason to pass missing_ok as true while dropping
> > > > tablesync slots?
> > > >
> > >
> > > AFAIK there is always a potential race with DropSubscription dropping
> > > slots. The DropSubscription might be running at exactly the same time
> > > the apply worker has just dropped the very same tablesync slot.
> > >
> >
> > We stopped the workers before getting a list of NotReady relations and
> > then we try to drop the corresponding slots. So, how such a race
> > condition can happen? Note, because we have a lock on pg_subscrition,
> > there is no chance that the workers can restart till the transaction
> > end.
>
> OK. I think I was forgetting the logicalrep_worker_stop would also go
> into a loop waiting for the worker process to die. So even if the
> tablesync worker does simultaneously drop it's own slot, I think it
> will certainly at least be in SYNCDONE state before DropSubscription
> does anything else with that worker.
>

How is that ensured? We don't have anything like HOLD_INTERRUPTS
between the time dropped the slot and updated rel state as SYNCDONE.
So, isn't it possible that after we dropped the slot and before we
update the state, the SIGTERM signal arrives and led to worker exit?

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: Faulty HEAP_XMAX_LOCK_ONLY & HEAP_KEYS_UPDATED hintbit combination
Next
From: David Rowley
Date:
Subject: Re: [sqlsmith] Failed assertion during partition pruning