RE: Perform streaming logical transactions by background workers and parallel apply - Mailing list pgsql-hackers

From houzj.fnst@fujitsu.com
Subject RE: Perform streaming logical transactions by background workers and parallel apply
Date
Msg-id OS0PR01MB57166A377638E57823219E32941A9@OS0PR01MB5716.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Perform streaming logical transactions by background workers and parallel apply  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Perform streaming logical transactions by background workers and parallel apply  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Tuesday, December 6, 2022 3:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Mon, Dec 5, 2022 at 9:59 AM houzj.fnst@fujitsu.com
> <houzj.fnst@fujitsu.com> wrote:
> >
> > Attach a new version patch set which fixed a testcase failure on CFbot.
> >
> 
> Few comments:
> ============
> 1.
> + /*
> + * Break the loop if the parallel apply worker has finished applying
> + * the transaction. The parallel apply worker should have closed the
> + * file before committing.
> + */
> + if (am_parallel_apply_worker() &&
> + MyParallelShared->xact_state == PARALLEL_TRANS_FINISHED)
> + goto done;
> 
> This looks hackish to me because ideally, this API should exit after reading and
> applying all the messages in the spool file. This check is primarily based on the
> knowledge that once we reach some state, the file won't have more data. I
> think it would be better to explicitly ensure the same.

I added a function to ensure that there is no message left after committing
the transaction.


> 2.
> + /*
> + * No need to output the DEBUG message here in the parallel apply
> + * worker as similar messages will be output when handling STREAM_STOP
> + * message.
> + */
> + if (!am_parallel_apply_worker() && nchanges % 1000 == 0)
>   elog(DEBUG1, "replayed %d changes from file \"%s\"",
>   nchanges, path);
>   }
> 
> I think this check appeared a bit ugly to me. I think it is okay to get a similar
> DEBUG message at another place (on stream_stop) because
> (a) this is logged every 1000 messages whereas stream_stop can be after many
> more messages, so there doesn't appear to be a direct correlation; (b) due to
> this, we can identify whether it is due to spooled messages or due to direct
> apply; ideally we can use another DEBUG message to differentiate but this
> doesn't appear bad to me.

OK, I removed this check.

> 3. The function names for serialize_stream_start(), serialize_stream_stop(), and
> serialize_stream_abort() don't seem to match the functionality they provide
> because none of these write/serialize changes to the file. Can we rename
> these? Some possible options could be stream_start_internal or
> stream_start_guts.

Renamed to stream_start_internal().

Attach the new version patch set which addressed above comments.
I also attach a new patch to force stream change(provided by Shi-san) and
another one that introduce a GUC stream_serialize_threshold (provided by
Kuroda-san and Shi-san) which can help testing the patch set.

Besides, I fixed a bug where there could still be messages left in memory
queue and the PA has started to apply spooled message.

Best regards,
Hou zj

Attachment

pgsql-hackers by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: [DOCS] Stats views and functions not in order?
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: Time delayed LR (WAS Re: logical replication restrictions)