Re: Single transaction in the tablesync worker? - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Single transaction in the tablesync worker?
Date
Msg-id CAA4eK1JS2x6CKzduvZNJM1v6=tEnYrQdBLQGEzpeX94eod9pAw@mail.gmail.com
Whole thread Raw
In response to Re: Single transaction in the tablesync worker?  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Wed, Jan 13, 2021 at 5:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Jan 12, 2021 at 6:17 PM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > On Mon, Jan 11, 2021 at 3:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > The workers for removed tables are now immediately stopped (like
> > DropSubscription does). Although I did include the AccessExclusiveLock
> > as (a) suggested, AFAIK this was actually ineffective at preventing
> > the workers relaunching.
> >
>
> The reason why it was ineffective is that you are locking
> SubscriptionRelationId which is to protect relaunch of apply workers
> not tablesync workers. But in current form even acquiring
> SubscriptionRelRelationId lock won't serve the purpose because
> process_syncing_tables_for_apply() doesn't always acquire it before
> relaunching the tablesync workers. However, if we acquire
> SubscriptionRelRelationId in process_syncing_tables_for_apply() then
> it would prevent relaunch of workers but not sure if that is a good
> idea. Can you think of some other way?
>
> > Instead, I am using
> > logicalrep_worker_stop_at_commit to do this - testing shows it as
> > working ok. Please see the code and latest test logs [v14] for
> > details.
> >
>
> There is still a window where it can relaunch. Basically, after you
> stop the worker in AlterSubscription_refresh and till the commit
> happens apply worker can relaunch the tablesync workers. I don't see
> code-wise how we can protect that. And if the tablesync workers are
> restarted after we stopped them, the purpose won't be achieved because
> it can recreate or try to reuse the slot which we have dropped.
>
> The other issue with the current code could be that after we drop the
> slot and origin what if the transaction (in which we are doing Alter
> Subscription) is rolledback? Basically, the workers will be relaunched
> and it would assume that slot should be there but the slot won't be
> present. I have thought of dropping the slot at commit time after we
> stop the workers but again not sure if that is a good idea because at
> that point we don't want to establish the connection with the
> publisher.
>
> I think this needs some more thought.
>

I have another idea to solve this problem. Instead of Alter
Subscription drop the slot/origin, we can let tablesync worker do it.
Basically, we need to register SignalHandlerForShutdownRequest as
SIGTERM handler and then later need to check ShutdownRequestPending
flag in the tablesync worker. If the flag is set, then we can drop the
slot/origin and allow the process to exit cleanly.

This will obviate the need to take the lock and all sort of rollback
problems. If this works out well then I think we can use this for
DropSubscription as well but that is a matter of separate patch.

Thoughts?

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: proposal: schema variables
Next
From: Fujii Masao
Date:
Subject: Re: [PATCH] Feature improvement for CLOSE, FETCH, MOVE tab completion