Re: Perform streaming logical transactions by background workers and parallel apply - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Perform streaming logical transactions by background workers and parallel apply
Date
Msg-id CAA4eK1KjGNA8T8O77rRhkv6bRT6OsdQaEy--2hNrJFCc80bN0A@mail.gmail.com
Whole thread Raw
In response to Re: Perform streaming logical transactions by background workers and parallel apply  (Amit Kapila <amit.kapila16@gmail.com>)
Responses RE: Perform streaming logical transactions by background workers and parallel apply
List pgsql-hackers
On Thu, Sep 22, 2022 at 3:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Sep 22, 2022 at 8:59 AM wangw.fnst@fujitsu.com
> <wangw.fnst@fujitsu.com> wrote:
> >
>
> Few comments on v33-0001
> =======================
>

Some more comments on v33-0001
=============================
1.
+ /* Information from the corresponding LogicalRepWorker slot. */
+ uint16 logicalrep_worker_generation;
+
+ int logicalrep_worker_slot_no;
+} ParallelApplyWorkerShared;

Both these variables are read/changed by leader/parallel workers
without using any lock (mutex). It seems currently there is no problem
because of the way the patch is using in_parallel_apply_xact but I
think it won't be a good idea to rely on it. I suggest using mutex to
operate on these variables and also check if the slot_no is in a valid
range after reading it in parallel_apply_free_worker, otherwise error
out using elog.

2.
 static void
 apply_handle_stream_stop(StringInfo s)
 {
- if (!in_streamed_transaction)
+ ParallelApplyWorkerInfo *winfo = NULL;
+ TransApplyAction apply_action;
+
+ if (!am_parallel_apply_worker() &&
+ (!in_streamed_transaction && !stream_apply_worker))
  ereport(ERROR,
  (errcode(ERRCODE_PROTOCOL_VIOLATION),
  errmsg_internal("STREAM STOP message without STREAM START")));

This check won't be able to detect missing stream start messages for
parallel apply workers apart from the first pair of start/stop. I
thought of adding in_remote_transaction check along with
am_parallel_apply_worker() to detect the same but that also won't work
because the parallel worker doesn't reset it at the stop message.
Another possibility is to introduce yet another variable for this but
that doesn't seem worth it. I would like to keep this check simple.
Can you think of any better way?

3. I think we can skip sending start/stop messages from the leader to
the parallel worker because unlike apply worker it will process only
one transaction-at-a-time. However, it is not clear whether that is
worth the effort because it is sent after logical_decoding_work_mem
changes. For now, I have added a comment for this in the attached
patch but let me if I am missing something or if I am wrong.

4.
postgres=# select pid, leader_pid, application_name, backend_type from
pg_stat_activity;
  pid  | leader_pid | application_name |         backend_type
-------+------------+------------------+------------------------------
 27624 |            |                  | logical replication launcher
 17336 |            | psql             | client backend
 26312 |            |                  | logical replication worker
 26376 |            | psql             | client backend
 14004 |            |                  | logical replication worker

Here, the second worker entry is for the parallel worker. Isn't it
better if we distinguish this by keeping type as a logical replication
parallel worker? I think for this you need to change bgw_type in
logicalrep_worker_launch().

5. Can we name parallel_apply_subxact_info_add() as
parallel_apply_start_subtrans()?

Apart from the above, I have added/edited a few comments and made a
few other cosmetic changes in the attached.

-- 
With Regards,
Amit Kapila.

Attachment

pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?
Next
From: Himanshu Upadhyaya
Date:
Subject: Re: HOT chain validation in verify_heapam()