Thread: Questions about logicalrep_worker_launch()

Questions about logicalrep_worker_launch()

From
Fujii Masao
Date:
Hi,

While reading the code of logicalrep_worker_launch(), I had two questions:

(1)
When the sync worker limit per subscription is reached, logicalrep_worker_launch()
runs garbage collection to try to free up slots before checking the limit again.
That makes sense.

But should we do the same when the parallel apply worker limit is reached?
Currently, if we've hit the parallel apply worker limit but not the sync worker limit
and we find an unused worker slot, garbage collection doesn't run. Would it
make sense to also run garbage collection in that case?

(2)
If garbage collection removes at least one worker, logicalrep_worker_launch()
scans all worker slots again to look for a free one. But since we know at least one
slot was freed, this retry might be unnecessary. We could just reuse the freed
slot directly. Is that correct?


The attached patch addresses both points. Since logicalrep_worker_launch()
isn't a performance-critical path, this might not be a high-priority change.
But if my understanding is correct, I'm a bit tempted to apply it as a refactoring.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachment

Re: Questions about logicalrep_worker_launch()

From
Masahiko Sawada
Date:
Hi,

On Fri, Apr 25, 2025 at 9:10 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
> Hi,
>
> While reading the code of logicalrep_worker_launch(), I had two questions:
>
> (1)
> When the sync worker limit per subscription is reached, logicalrep_worker_launch()
> runs garbage collection to try to free up slots before checking the limit again.
> That makes sense.
>
> But should we do the same when the parallel apply worker limit is reached?
> Currently, if we've hit the parallel apply worker limit but not the sync worker limit
> and we find an unused worker slot, garbage collection doesn't run. Would it
> make sense to also run garbage collection in that case?

Good point. In that case, we would end up not being able to launch
parallel apply workers for the subscription until either we used up
all worker slots or reached the sync worker limit, which is bad.

>
> (2)
> If garbage collection removes at least one worker, logicalrep_worker_launch()
> scans all worker slots again to look for a free one. But since we know at least one
> slot was freed, this retry might be unnecessary. We could just reuse the freed
> slot directly. Is that correct?

Agreed. Since these codes are protected by LogicalRepWorkerLock in an
exclusive mode, we should be able to use the worker slot that we just
cleaned up.

> The attached patch addresses both points. Since logicalrep_worker_launch()
> isn't a performance-critical path, this might not be a high-priority change.
> But if my understanding is correct, I'm a bit tempted to apply it as a refactoring.

I agree with these changes.

I think that while the changes for (2) should be for v19, the changes
for (1) might be treated as a bug fix?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com