RE: Perform streaming logical transactions by background workers and parallel apply - Mailing list pgsql-hackers

From houzj.fnst@fujitsu.com
Subject RE: Perform streaming logical transactions by background workers and parallel apply
Date
Msg-id OS0PR01MB571673E3EE2A65A3B8B7713B942D9@OS0PR01MB5716.jpnprd01.prod.outlook.com
Whole thread Raw
In response to RE: Perform streaming logical transactions by background workers and parallel apply  ("kuroda.hayato@fujitsu.com" <kuroda.hayato@fujitsu.com>)
List pgsql-hackers
On Wednesday, October 19, 2022 8:50 PM Kuroda, Hayato/黒田 隼人 <kuroda.hayato@fujitsu.com> wrote:
> 
> ===
> 01. applyparallelworker.c - SIZE_STATS_MESSAGE
> 
> ```
> /*
>  * There are three fields in each message received by the parallel apply
>  * worker: start_lsn, end_lsn and send_time. Because we have updated these
>  * statistics in the leader apply worker, we can ignore these fields in the
>  * parallel apply worker (see function LogicalRepApplyLoop).
>  */
> #define SIZE_STATS_MESSAGE (2 * sizeof(XLogRecPtr) + sizeof(TimestampTz))
> ```
> 
> According to other comment styles, it seems that the first sentence of the
> comment should represent the datatype and usage, not the detailed reason.
> For example, about ParallelApplyWorkersList, you said "A list ...". How about
> adding like following message:
> The message size that can be skipped by parallel apply worker

Thanks for the comments, but the current description seems enough to me.

> ~~~
> 02. applyparallelworker.c - parallel_apply_start_subtrans
> 
> ```
>     if (current_xid != top_xid &&
>         !list_member_xid(subxactlist, current_xid)) ```
> 
> A macro TransactionIdEquals is defined in access/transam.h. Should we use it,
> or is it too trivial?

I checked the existing codes, it seems both style are being used.
Maybe we can post a separate patch to replace them later.

> ~~~
> 08. worker.c - apply_handle_prepare_internal
> 
> Same as above.
> 
> 
> ~~~
> 09. worker.c - maybe_reread_subscription
> 
> ```
>     /*
>      * Exit if any parameter that affects the remote connection was
> changed.
>      * The launcher will start a new worker.
>      */
>     if (strcmp(newsub->conninfo, MySubscription->conninfo) != 0 ||
>         strcmp(newsub->name, MySubscription->name) != 0 ||
>         strcmp(newsub->slotname, MySubscription->slotname) != 0 ||
>         newsub->binary != MySubscription->binary ||
>         newsub->stream != MySubscription->stream ||
>         strcmp(newsub->origin, MySubscription->origin) != 0 ||
>         newsub->owner != MySubscription->owner ||
>         !equal(newsub->publications, MySubscription->publications))
>     {
>         ereport(LOG,
>                 (errmsg("logical replication apply worker for
> subscription \"%s\" will restart because of a parameter change",
>                         MySubscription->name)));
> 
>         proc_exit(0);
>     }
> ```
> 
> When the parallel apply worker has been launched and then the subscription
> option has been modified, the same message will appear twice.
> But if the option "streaming" is changed from "parallel" to "on", one of them
> will not restart again.
> Should we modify message?

Thanks, it seems a timing problem, if the leader catch the change first and stop
the parallel workers, the message will only appear once. But I agree we'd
better make the message clear. I changed the message in parallel apply worker.
While on it, I also adjusted some other message to include "parallel apply
worker" if they are in parallel apply worker.

Best regards,
Hou zj


pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: Data is copied twice when specifying both child and parent table in publication
Next
From: Alvaro Herrera
Date:
Subject: parse partition strategy string in gram.y