On Thu, Aug 10, 2023 at 10:15 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Aug 9, 2023 at 8:28 AM Zhijie Hou (Fujitsu)
> <houzj.fnst@fujitsu.com> wrote:
> >
> > On Thursday, August 3, 2023 7:30 PM Melih Mutlu <m.melihmutlu@gmail.com> wrote:
> >
> > > Right. I attached the v26 as you asked.
> >
> > Thanks for posting the patches.
> >
> > While reviewing the patch, I noticed one rare case that it's possible that there
> > are two table sync worker for the same table in the same time.
> >
> > The patch relies on LogicalRepWorkerLock to prevent concurrent access, but the
> > apply worker will start a new worker after releasing the lock. So, at the point[1]
> > where the lock is released and the new table sync worker has not been started,
> > it seems possible that another old table sync worker will be reused for the
> > same table.
> >
> > /* Now safe to release the LWLock */
> > LWLockRelease(LogicalRepWorkerLock);
> > *[1]
> > /*
> > * If there are free sync worker slot(s), start a new sync
> > * worker for the table.
> > */
> > if (nsyncworkers < max_sync_workers_per_subscription)
> > ...
> > logicalrep_worker_launch(MyLogicalRepWorker->dbid,
> >
>
> Yeah, this is a problem. I think one idea to solve this is by
> extending the lock duration till we launch the tablesync worker but we
> should also consider changing this locking scheme such that there is a
> better way to indicate that for a particular rel, tablesync is in
> progress. Currently, the code in TablesyncWorkerMain() also acquires
> the lock in exclusive mode even though the tablesync for a rel is in
> progress which I guess could easily heart us for larger values of
> max_logical_replication_workers. So, that could be another motivation
> to think for a different locking scheme.
>
Yet another problem is that currently apply worker maintains a hash
table for 'last_start_times' to avoid restarting the tablesync worker
immediately upon error. The same functionality is missing while
reusing the table sync worker. One possibility is to use a shared hash
table to remember start times but I think it may depend on what we
decide to solve the previous problem reported by Hou-San.
--
With Regards,
Amit Kapila.