RE: Perform streaming logical transactions by background workers and parallel apply - Mailing list pgsql-hackers

From houzj.fnst@fujitsu.com
Subject RE: Perform streaming logical transactions by background workers and parallel apply
Date
Msg-id OS0PR01MB5716E7E5798625AE9437CD6F94439@OS0PR01MB5716.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Perform streaming logical transactions by background workers and parallel apply  (Peter Smith <smithpb2250@gmail.com>)
Responses Re: Perform streaming logical transactions by background workers and parallel apply
List pgsql-hackers
On Friday, September 9, 2022 3:02 PM Peter Smith <smithpb2250@gmail.com> wrote:
> 
> Here are my review comments for the v28-0001 patch:
> 
> (There may be some overlap with other people's review comments and/or
> some fixes already made).
> 

Thanks for the comments.


> 3.
> 
> max_logical_replication_workers (integer)
>     Specifies maximum number of logical replication workers. This
> includes apply leader workers, parallel apply workers, and table
> synchronization workers.
>     Logical replication workers are taken from the pool defined by
> max_worker_processes.
>     The default value is 4. This parameter can only be set at server start.
> 
> ~
> 
> I did not really understand why the default is 4. Because the  default
> tablesync workers is 2, and the default parallel workers is 2, but
> what about accounting for the apply worker? Therefore, shouldn't
> max_logical_replication_workers default be 5 instead of 4?

The parallel apply is disabled by default, so it's not a must to increase this
global default value as discussed[1]

[1] https://www.postgresql.org/message-id/CAD21AoCwaU8SqjmC7UkKWNjDg3Uz4FDGurMpis3zw5SEC%2B27jQ%40mail.gmail.com


> 6. src/backend/replication/logical/applyparallelworker.c
> 
> Is there a reason why this file is called applyparallelworker.c
> instead of parallelapplyworker.c? Now this name is out of step with
> names of all the new typedefs etc.

It was suggested which is consistent with the "vacuumparallel.c", but I am fine
with either name. I can change this if more people think parallelapplyworker.c
is better.


> 16. src/backend/replication/logical/applyparallelworker.c -
> parallel_apply_find_worker
> 
> + /* Return the cached parallel apply worker if valid. */
> + if (stream_apply_worker != NULL)
> + return stream_apply_worker;
> 
> Perhaps 'cur_stream_parallel_apply_winfo' is a better name for this var?

This looks a bit long to me.

>   /* Now wait until it attaches. */
> - WaitForReplicationWorkerAttach(worker, generation, bgw_handle);
> + return WaitForReplicationWorkerAttach(worker, generation, bgw_handle);
> 
> The comment feels a tiny bit misleading, because there is a chance
> that this might not attach at all and return false if something goes
> wrong.

I feel it might be better to fix this via a separate patch.


> Now that there is an apply_action enum I felt it is better for this
> code to be using a switch instead of all the if/else. Furthermore, it
> might be better to put the switch case in a logical order (e.g. same
> as the suggested enums value order of #29a).

I'm not sure whether switch case is better than if/else here. But if more
people prefer, I can change this.


> 23. src/backend/replication/logical/launcher.c - logicalrep_worker_launch
> 
> + nparallelapplyworkers = logicalrep_parallel_apply_worker_count(subid);
> +
> + /*
> + * Return silently if the number of parallel apply workers reached the
> + * limit per subscription.
> + */
> + if (is_subworker && nparallelapplyworkers >=
> max_parallel_apply_workers_per_subscription)
> + {
> + LWLockRelease(LogicalRepWorkerLock);
> + return false;
>   }
> I’m not sure if this is a good idea to be so silent. How will the user
> know if they should increase the GUC parameter or not if it never
> tells them that the value is too low ?

It's like what we do for table sync worker. Besides, I think user is
likely to intentionally limit the parallel apply worker number to leave free
workers for other purposes. And we do report a WARNING later if there is no
free worker slots errmsg("out of logical replication worker slots").


> 41. src/backend/replication/logical/worker.c - store_flush_position
> 
> + /* Skip if not the leader apply worker */
> + if (am_parallel_apply_worker())
> + return;
> +
> 
> Code might be better to implement/use a new function so it can check
> something like !am_leader_apply_worker()

Based on the existing code, both leader and table sync worker could enter this
function. Using !am_leader_apply_worker() seems will disallow table sync worker
to enter this function which might be not good although .


> 47. src/backend/storage/ipc/procsignal.c - procsignal_sigusr1_handler
> 
> @@ -657,6 +658,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
>   if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT))
>   HandleLogMemoryContextInterrupt();
> 
> + if (CheckProcSignal(PROCSIG_PARALLEL_APPLY_MESSAGE))
> + HandleParallelApplyMessageInterrupt();
> +
> 
> I wasn’t sure about the placement of this new code because those
> CheckProcSignal don’t seem to have any particular order. I think this
> belongs adjacent to the PROCSIG_PARALLEL_MESSAGE since it has the most
> in common with that one.

I'm not very sure, I just followed the way we used to add new SignalReason
(e.g. add the new reason at the last but before the Recovery conflict reasons).
And the parallel apply is not very similar to parallel query in detail.


> I thought it might be worthwhile to also add another function like
> am_leader_apply_worker(). I noticed at least one place in this patch
> where it could have been called.

It seems a bit unnecessary to introduce a new macro where we already can use
am_parallel_apply_worker to check.


Best regards,
Hou zj

pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: proposal: possibility to read dumped table's name from file
Next
From: Amit Kapila
Date:
Subject: Re: why can't a table be part of the same publication as its schema