Re: Perform streaming logical transactions by background workers and parallel apply - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Perform streaming logical transactions by background workers and parallel apply
Date
Msg-id CAA4eK1+ELh2vPre3JHyoeKV0A9_V7aXQD0QBPv86WEn7P_rK-g@mail.gmail.com
Whole thread Raw
In response to RE: Perform streaming logical transactions by background workers and parallel apply  ("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>)
Responses RE: Perform streaming logical transactions by background workers and parallel apply  ("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>)
List pgsql-hackers
On Mon, Dec 5, 2022 at 9:59 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
>
> Attach a new version patch set which fixed a testcase failure on CFbot.
>

Few comments:
============
1.
+ /*
+ * Break the loop if the parallel apply worker has finished applying
+ * the transaction. The parallel apply worker should have closed the
+ * file before committing.
+ */
+ if (am_parallel_apply_worker() &&
+ MyParallelShared->xact_state == PARALLEL_TRANS_FINISHED)
+ goto done;

This looks hackish to me because ideally, this API should exit after
reading and applying all the messages in the spool file. This check is
primarily based on the knowledge that once we reach some state, the
file won't have more data. I think it would be better to explicitly
ensure the same.

2.
+ /*
+ * No need to output the DEBUG message here in the parallel apply
+ * worker as similar messages will be output when handling STREAM_STOP
+ * message.
+ */
+ if (!am_parallel_apply_worker() && nchanges % 1000 == 0)
  elog(DEBUG1, "replayed %d changes from file \"%s\"",
  nchanges, path);
  }

I think this check appeared a bit ugly to me. I think it is okay to
get a similar DEBUG message at another place (on stream_stop) because
(a) this is logged every 1000 messages whereas stream_stop can be
after many more messages, so there doesn't appear to be a direct
correlation; (b) due to this, we can identify whether it is due to
spooled messages or due to direct apply; ideally we can use another
DEBUG message to differentiate but this doesn't appear bad to me.

3. The function names for serialize_stream_start(),
serialize_stream_stop(), and serialize_stream_abort() don't seem to
match the functionality they provide because none of these
write/serialize changes to the file. Can we rename these? Some
possible options could be stream_start_internal or stream_start_guts.

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Re: O(n) tasks cause lengthy startups and checkpoints
Next
From: Peter Smith
Date:
Subject: Re: Time delayed LR (WAS Re: logical replication restrictions)