On Thursday, August 3, 2023 7:30 PM Melih Mutlu <m.melihmutlu@gmail.com> wrote:
> Right. I attached the v26 as you asked.
Thanks for posting the patches.
While reviewing the patch, I noticed one rare case that it's possible that there
are two table sync worker for the same table in the same time.
The patch relies on LogicalRepWorkerLock to prevent concurrent access, but the
apply worker will start a new worker after releasing the lock. So, at the point[1]
where the lock is released and the new table sync worker has not been started,
it seems possible that another old table sync worker will be reused for the
same table.
/* Now safe to release the LWLock */
LWLockRelease(LogicalRepWorkerLock);
*[1]
/*
* If there are free sync worker slot(s), start a new sync
* worker for the table.
*/
if (nsyncworkers < max_sync_workers_per_subscription)
...
logicalrep_worker_launch(MyLogicalRepWorker->dbid,
I can reproduce it by using gdb.
Steps:
1. set max_sync_workers_per_subscription to 1 and setup pub/sub which publishes
two tables(table A and B).
2. when the table sync worker for the table A started, use gdb to block it
before being reused for another table.
3. set max_sync_workers_per_subscription to 2 and use gdb to block the apply
worker at the point after releasing the LogicalRepWorkerLock and before
starting another table sync worker for table B.
4. release the blocked table sync worker, then we can see the table sync worker
is also reused for table B.
5. release the apply worker, then we can see the apply worker will start
another table sync worker for the same table(B).
I think it would be better to prevent this case from happening as this case
will give some unexpected ERROR or LOG. Note that I haven't checked if it would
cause worse problems like duplicate copy or others.
Best Regards,
Hou zj