RE: Perform streaming logical transactions by background workers and parallel apply - Mailing list pgsql-hackers
From | houzj.fnst@fujitsu.com |
---|---|
Subject | RE: Perform streaming logical transactions by background workers and parallel apply |
Date | |
Msg-id | OS0PR01MB5716E7E5798625AE9437CD6F94439@OS0PR01MB5716.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: Perform streaming logical transactions by background workers and parallel apply (Peter Smith <smithpb2250@gmail.com>) |
Responses |
Re: Perform streaming logical transactions by background workers and parallel apply
|
List | pgsql-hackers |
On Friday, September 9, 2022 3:02 PM Peter Smith <smithpb2250@gmail.com> wrote: > > Here are my review comments for the v28-0001 patch: > > (There may be some overlap with other people's review comments and/or > some fixes already made). > Thanks for the comments. > 3. > > max_logical_replication_workers (integer) > Specifies maximum number of logical replication workers. This > includes apply leader workers, parallel apply workers, and table > synchronization workers. > Logical replication workers are taken from the pool defined by > max_worker_processes. > The default value is 4. This parameter can only be set at server start. > > ~ > > I did not really understand why the default is 4. Because the default > tablesync workers is 2, and the default parallel workers is 2, but > what about accounting for the apply worker? Therefore, shouldn't > max_logical_replication_workers default be 5 instead of 4? The parallel apply is disabled by default, so it's not a must to increase this global default value as discussed[1] [1] https://www.postgresql.org/message-id/CAD21AoCwaU8SqjmC7UkKWNjDg3Uz4FDGurMpis3zw5SEC%2B27jQ%40mail.gmail.com > 6. src/backend/replication/logical/applyparallelworker.c > > Is there a reason why this file is called applyparallelworker.c > instead of parallelapplyworker.c? Now this name is out of step with > names of all the new typedefs etc. It was suggested which is consistent with the "vacuumparallel.c", but I am fine with either name. I can change this if more people think parallelapplyworker.c is better. > 16. src/backend/replication/logical/applyparallelworker.c - > parallel_apply_find_worker > > + /* Return the cached parallel apply worker if valid. */ > + if (stream_apply_worker != NULL) > + return stream_apply_worker; > > Perhaps 'cur_stream_parallel_apply_winfo' is a better name for this var? This looks a bit long to me. > /* Now wait until it attaches. */ > - WaitForReplicationWorkerAttach(worker, generation, bgw_handle); > + return WaitForReplicationWorkerAttach(worker, generation, bgw_handle); > > The comment feels a tiny bit misleading, because there is a chance > that this might not attach at all and return false if something goes > wrong. I feel it might be better to fix this via a separate patch. > Now that there is an apply_action enum I felt it is better for this > code to be using a switch instead of all the if/else. Furthermore, it > might be better to put the switch case in a logical order (e.g. same > as the suggested enums value order of #29a). I'm not sure whether switch case is better than if/else here. But if more people prefer, I can change this. > 23. src/backend/replication/logical/launcher.c - logicalrep_worker_launch > > + nparallelapplyworkers = logicalrep_parallel_apply_worker_count(subid); > + > + /* > + * Return silently if the number of parallel apply workers reached the > + * limit per subscription. > + */ > + if (is_subworker && nparallelapplyworkers >= > max_parallel_apply_workers_per_subscription) > + { > + LWLockRelease(LogicalRepWorkerLock); > + return false; > } > I’m not sure if this is a good idea to be so silent. How will the user > know if they should increase the GUC parameter or not if it never > tells them that the value is too low ? It's like what we do for table sync worker. Besides, I think user is likely to intentionally limit the parallel apply worker number to leave free workers for other purposes. And we do report a WARNING later if there is no free worker slots errmsg("out of logical replication worker slots"). > 41. src/backend/replication/logical/worker.c - store_flush_position > > + /* Skip if not the leader apply worker */ > + if (am_parallel_apply_worker()) > + return; > + > > Code might be better to implement/use a new function so it can check > something like !am_leader_apply_worker() Based on the existing code, both leader and table sync worker could enter this function. Using !am_leader_apply_worker() seems will disallow table sync worker to enter this function which might be not good although . > 47. src/backend/storage/ipc/procsignal.c - procsignal_sigusr1_handler > > @@ -657,6 +658,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS) > if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT)) > HandleLogMemoryContextInterrupt(); > > + if (CheckProcSignal(PROCSIG_PARALLEL_APPLY_MESSAGE)) > + HandleParallelApplyMessageInterrupt(); > + > > I wasn’t sure about the placement of this new code because those > CheckProcSignal don’t seem to have any particular order. I think this > belongs adjacent to the PROCSIG_PARALLEL_MESSAGE since it has the most > in common with that one. I'm not very sure, I just followed the way we used to add new SignalReason (e.g. add the new reason at the last but before the Recovery conflict reasons). And the parallel apply is not very similar to parallel query in detail. > I thought it might be worthwhile to also add another function like > am_leader_apply_worker(). I noticed at least one place in this patch > where it could have been called. It seems a bit unnecessary to introduce a new macro where we already can use am_parallel_apply_worker to check. Best regards, Hou zj
pgsql-hackers by date: