Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Date
Msg-id CAA4eK1+BmkT3jjUnejOqNFW_78D5PiRHNNAwLmfBSb7cj8QSrg@mail.gmail.com
Whole thread Raw
In response to RE: [PATCH] Reuse Workers and Replication Slots during Logical Replication  ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>)
Responses Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
List pgsql-hackers
On Wed, Aug 9, 2023 at 8:28 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> On Thursday, August 3, 2023 7:30 PM Melih Mutlu <m.melihmutlu@gmail.com>  wrote:
>
> > Right. I attached the v26 as you asked.
>
> Thanks for posting the patches.
>
> While reviewing the patch, I noticed one rare case that it's possible that there
> are two table sync worker for the same table in the same time.
>
> The patch relies on LogicalRepWorkerLock to prevent concurrent access, but the
> apply worker will start a new worker after releasing the lock. So, at the point[1]
> where the lock is released and the new table sync worker has not been started,
> it seems possible that another old table sync worker will be reused for the
> same table.
>
>                                 /* Now safe to release the LWLock */
>                                 LWLockRelease(LogicalRepWorkerLock);
> *[1]
>                                 /*
>                                  * If there are free sync worker slot(s), start a new sync
>                                  * worker for the table.
>                                  */
>                                 if (nsyncworkers < max_sync_workers_per_subscription)
>                                 ...
>                                                 logicalrep_worker_launch(MyLogicalRepWorker->dbid,
>

Yeah, this is a problem. I think one idea to solve this is by
extending the lock duration till we launch the tablesync worker but we
should also consider changing this locking scheme such that there is a
better way to indicate that for a particular rel, tablesync is in
progress. Currently, the code in TablesyncWorkerMain() also acquires
the lock in exclusive mode even though the tablesync for a rel is in
progress which I guess could easily heart us for larger values of
max_logical_replication_workers. So, that could be another motivation
to think for a different locking scheme.

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Incorrect handling of OOM in WAL replay leading to data loss
Next
From: Pavel Luzanov
Date:
Subject: Re: PG 16 draft release notes ready