RE: Perform streaming logical transactions by background workers and parallel apply - Mailing list pgsql-hackers

From houzj.fnst@fujitsu.com
Subject RE: Perform streaming logical transactions by background workers and parallel apply
Date
Msg-id OS0PR01MB5716B802A1733548A99761AE94129@OS0PR01MB5716.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Perform streaming logical transactions by background workers and parallel apply  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Perform streaming logical transactions by background workers and parallel apply
Re: Perform streaming logical transactions by background workers and parallel apply
List pgsql-hackers
On Mon, November 28, 2022 20:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Sun, Nov 27, 2022 at 9:43 AM houzj.fnst@fujitsu.com 
> <houzj.fnst@fujitsu.com> wrote:
> >
> > Attach the new version patch which addressed all comments so far.
> >
> 
> Few comments on v52-0001*
> ========================
> 1.
> pa_free_worker()
> {
> ...
> + /* Free the worker information if the worker exited cleanly. */ if 
> + (!winfo->error_mq_handle) { pa_free_worker_info(winfo);
> +
> + if (winfo->in_use &&
> + !hash_search(ParallelApplyWorkersHash, &xid, HASH_REMOVE, NULL)) 
> + elog(ERROR, "hash table corrupted");
> 
> pa_free_worker_info() pfrees the winfo, so how is it legal to
> winfo->in_use in the above check?
> 
> Also, why is this check (!winfo->error_mq_handle) required in the 
> first place in the patch? The worker exits cleanly only when the 
> leader apply worker sends a SIGINT signal and in that case, we already 
> detach from the error queue and clean up other worker information.

It was intended for the case when a user send a signal, but it seems not standard way to do that.
So, I removed this check (!winfo->error_mq_handle).

> 2.
> +HandleParallelApplyMessages(void)
> +{
> ...
> ...
> + foreach(lc, ParallelApplyWorkersList) { shm_mq_result res; Size 
> + nbytes;
> + void    *data;
> + ParallelApplyWorkerInfo *winfo = (ParallelApplyWorkerInfo *) 
> + lfirst(lc);
> +
> + if (!winfo->error_mq_handle)
> + continue;
> 
> Similar to the previous comment, it is not clear whether we need this 
> check. If required, can we add a comment to indicate the case where it 
> happens to be true?
> Note, there is a similar check for winfo->error_mq_handle in 
> pa_wait_for_xact_state(). Please add some comments if that is 
> required.

Removed this check in these two functions.

> 3. Why is there apply_worker_clean_exit() at the end of 
> ParallelApplyWorkerMain()? Normally either the leader worker stops 
> parallel apply, or parallel apply gets stopped because of a parameter 
> change, or exits because of error, and in none of those cases it can 
> hit this code path unless I am missing something.
> 
> Additionally, I think in LogicalParallelApplyLoop, we will never 
> receive zero-length messages so that is also wrong and should be 
> converted to elog(ERROR,..).

Agreed and changed. 

> 4. I think in logicalrep_worker_detach(), we should detach from the 
> shm error queue so that the parallel apply worker won't try to send a 
> termination message back to the leader worker.

Agreed and changed.

> 5.
> pa_send_data()
> {
> ...
> + if (startTime == 0)
> + startTime = GetCurrentTimestamp();
> ...
> 
> What is the use of getting the current timestamp before waitlatch 
> logic, if it is not used before that? It seems that is for the time 
> logic to look correct. We can probably reduce the 10s interval to 9s 
> for that.

Changed.

> In this function, we need to add some comments to indicate why the 
> current logic is used, and also probably we can refer to the comments 
> atop this file.

Added some comments.

> 6. I think it will be better if we keep stream_apply_worker local to 
> applyparallelworker.c by exposing functions to cache/resetting the 
> required info.

Agree. Added a new function to set the stream_apply_worker.

> 7. Apart from the above, I have made a few changes in the comments and 
> some miscellaneous cosmetic changes in the attached. Kindly include 
> these in the next version unless you see a problem with any change.

Thanks, I have checked and merge them.

Attach the new version patch which addressed all comments.

Best regards,
Hou zj

Attachment

pgsql-hackers by date:

Previous
From: Ajin Cherian
Date:
Subject: Re: Support logical replication of DDLs
Next
From: Jeff Davis
Date:
Subject: Re: Collation version tracking for macOS