Re: Perform streaming logical transactions by background workers and parallel apply - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Perform streaming logical transactions by background workers and parallel apply
Date
Msg-id CAA4eK1J2UAXsLTbSx1s407W9rHh2RdfF99atMr1khEHou6SgQw@mail.gmail.com
Whole thread Raw
In response to RE: Perform streaming logical transactions by background workers and parallel apply  ("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>)
Responses RE: Perform streaming logical transactions by background workers and parallel apply
List pgsql-hackers
On Thu, Sep 1, 2022 at 4:53 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
>

Review of v27-0001*:
================
1. I feel the usage of in_remote_transaction and in_use flags is
slightly complex. IIUC, the patch uses in_use flag to ensure commit
ordering by waiting for it to become false before proceeding in
transaction finish commands in leader apply worker. If so, I think it
is better to name it in_parallel_apply_xact and set it to true only
when we start applying xact in parallel apply worker and set it to
false when we finish the xact in parallel apply worker. It can be
initialized to false while setting up DSM. Also, accordingly change
the function parallel_apply_wait_for_free() to
parallel_apply_wait_for_xact_finish and parallel_apply_set_idle to
parallel_apply_set_xact_finish. We can change the name of the
in_remote_transaction flag to in_use.

Please explain about these flags in the struct where they are declared.

2. The worker_id in ParallelApplyWorkerShared struct could have wrong
information after the worker is reused from the pool. Because we could
have removed some other worker from the ParallelApplyWorkersList which
will make the value of worker_id wrong. For error/debug messages, we
can probably use LSN if available or can oid of subscription if
required. I thought of using xid as well but I think it is better to
avoid that in messages as it can wraparound. See, if the patch uses
xid in other messages, it is better to either use it along with LSN or
try to use only LSN.

3.
elog(ERROR, "[Parallel Apply Worker #%u] unexpected message \"%c\"",
+ shared->worker_id, c);

Also, I am not sure whether the above style (use of []) of messages is
good. Did you follow the usage from some other place?

4.
apply_handle_stream_stop(StringInfo s)
{
...
+ if (apply_action == TA_APPLY_IN_PARALLEL_WORKER)
+ {
+ elog(DEBUG1, "[Parallel Apply Worker #%u] ended processing streaming chunk, "
+ "waiting on shm_mq_receive", MyParallelShared->worker_id);
...

I don't understand the relevance of "waiting on shm_mq_receive" in the
above message because AFAICS, here we are not waiting on any receive
call.

5. I suggest you please go through all the ERROR/LOG/DEBUG messages in
the patch and try to improve them based on the above comments.

6.
+ * The dynamic shared memory segment will contain (1) a shm_mq that can be used
+ * to send errors (and other messages reported via elog/ereport) from the
+ * parallel apply worker to leader apply worker (2) another shm_mq that can be
+ * used to send changes in the transaction from leader apply worker to parallel
+ * apply worker

Here, it would be better to switch (1) and (2). I feel it is better to
explain first about how the main apply information is exchanged among
workers.

7.
+ /* Try to get a free parallel apply worker. */
+ foreach(lc, ParallelApplyWorkersList)
+ {
+ ParallelApplyWorkerInfo *tmp_winfo;
+
+ tmp_winfo = (ParallelApplyWorkerInfo *) lfirst(lc);
+
+ if (tmp_winfo->error_mq_handle == NULL)
+ {
+ /*
+ * Release the worker information and try next one if the parallel
+ * apply worker exited cleanly.
+ */
+ ParallelApplyWorkersList =
foreach_delete_current(ParallelApplyWorkersList, lc);
+ shm_mq_detach(tmp_winfo->mq_handle);
+ dsm_detach(tmp_winfo->dsm_seg);
+ pfree(tmp_winfo);
+
+ continue;
+ }
+
+ if (!tmp_winfo->in_remote_transaction)
+ {
+ winfo = tmp_winfo;
+ break;
+ }
+ }

Can we write it as if ... else if? If so, then we don't need to
continue in the first loop. And, can we add some more comments to
explain these cases?

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: test_decoding assertion failure for the loss of top-sub transaction relationship
Next
From: Thomas Munro
Date:
Subject: Re: pg15b3: recovery fails with wal prefetch enabled