On Wednesday, November 23, 2022 9:40 PM Amit Kapila <amit.kapila16@gmail.com>
>
> On Tue, Nov 22, 2022 at 7:30 AM houzj.fnst@fujitsu.com
> <houzj.fnst@fujitsu.com> wrote:
> >
>
> Few minor comments and questions:
> ============================
> 1.
> +static void
> +LogicalParallelApplyLoop(shm_mq_handle *mqh)
> {
> + for (;;)
> + {
> + void *data;
> + Size len;
> +
> + ProcessParallelApplyInterrupts();
> ...
> ...
> + if (rc & WL_LATCH_SET)
> + {
> + ResetLatch(MyLatch);
> + ProcessParallelApplyInterrupts();
> + }
> ...
> }
>
> Why ProcessParallelApplyInterrupts() is called twice in
> LogicalParallelApplyLoop()?
I think the second call is unnecessary, so removed it.
> 2.
> + * This scenario is similar to the first case but TX-1 and TX-2 are
> + executed by
> + * two parallel apply workers (PA-1 and PA-2 respectively). In this
> + scenario,
> + * PA-2 is waiting for PA-1 to complete its transaction while PA-1 is
> + waiting
> + * for subsequent input from LA. Also, LA is waiting for PA-2 to
> + complete its
> + * transaction in order to preserve the commit order. There is a
> + deadlock among
> + * three processes.
> + *
> ...
> ...
> + *
> + * LA (waiting to acquire the local transaction lock) -> PA-1 (waiting
> + to
> + * acquire the lock on the unique index) -> PA-2 (waiting to acquire
> + the lock
> + * on the remote transaction) -> LA
> + *
>
> Isn't the order of PA-1 and PA-2 different in the second paragraph as compared
> to the first one.
Fixed.
> 3.
> + * Deadlock-detection
> + * ------------------
>
> It may be better to keep the title of this section as Locking Considerations.
>
> 4. In the section mentioned in Point 3, it would be better to separately explain
> why we need session-level locks instead of transaction level.
Added.
> 5. Add the below comments in the code:
> diff --git a/src/backend/replication/logical/applyparallelworker.c
> b/src/backend/replication/logical/applyparallelworker.c
> index 9385afb6d2..56f00defcf 100644
> --- a/src/backend/replication/logical/applyparallelworker.c
> +++ b/src/backend/replication/logical/applyparallelworker.c
> @@ -431,6 +431,9 @@ pa_free_worker_info(ParallelApplyWorkerInfo *winfo)
> if (winfo->dsm_seg != NULL)
> dsm_detach(winfo->dsm_seg);
>
> + /*
> + * Ensure this worker information won't be reused during
> worker allocation.
> + */
> ,
>
> winfo);
>
> @@ -762,6 +765,10 @@
> HandleParallelApplyMessage(ParallelApplyWorkerInfo *winfo, StringInfo
> msg)
> */
> error_context_stack =
> apply_error_context_stack;
>
> + /*
> + * The actual error must be already
> reported by parallel apply
> + * worker.
> + */
> ereport(ERROR,
>
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> errmsg("parallel apply worker
> exited abnormally"),
Added.
Attach the new version patch which addressed all comments so far.
Besides, I let the PA send a different message to LA when it exits due to
subscription information change. The LA will report a more meaningful message
and restart replication after catching new message to prevent the LA from
sending message to exited PA.
Best regards,
Hou zj