Thread: Re: Tablesync early exit

Re: Tablesync early exit

From

Peter Smith

Date:

30 August 2021, 03:20:30

Patch v2 is the same; it only needed re-basing to the latest HEAD.

----
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

v2-0001-Tablesync-early-exit.patch

Re: Tablesync early exit

From

Amit Kapila

Date:

16 March 2022, 05:06:48

On Mon, Aug 30, 2021 at 8:50 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Patch v2 is the same; it only needed re-basing to the latest HEAD.
>

Why do you think it is correct to exit before trying to receive any
message? How will we ensure whether the apply worker has processed any
message? At the beginning of function LogicalRepApplyLoop(),
last_received is the LSN where the copy has finished in the case of
tablesync worker. I think we need to receive the message before trying
to ensure whether we have synced with the apply worker or not.

-- 
With Regards,
Amit Kapila.

Re: Tablesync early exit

From

Peter Smith

Date:

01 April 2022, 08:22:20

On Wed, Mar 16, 2022 at 4:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Aug 30, 2021 at 8:50 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > Patch v2 is the same; it only needed re-basing to the latest HEAD.
> >
>
> Why do you think it is correct to exit before trying to receive any
> message?

I think the STATE_CATCHUP state guarantees the apply worker must have
received (or tried to receive) a message. See the next answer.

> How will we ensure whether the apply worker has processed any
> message?

All this patch code does is call process_syncing_tables, which
delegates to process_syncing_tables_for_sync (because the call is from
a tablesync worker). This function code can’t do anything unless the
tablesync worker is in STATE_CATCHUP state, and that cannot happen
unless it was explicitly set to that state by the apply worker.

On the other side of the coin, the apply worker can only set that
syncworker->relstate = SUBREL_STATE_CATCHUP from within function
process_syncing_tables_for_apply, and AFAIK that function is only
called when the apply worker has either handled a message, (or the
walrcv_receive in the  LogicalRepApplyLoop received nothing).

So I think the STATE_CATCHUP mechanism itself ensures the apply worker
*must* have already processed a message (or there was no message to
process).

> At the beginning of function LogicalRepApplyLoop(),
> last_received is the LSN where the copy has finished in the case of
> tablesync worker. I think we need to receive the message before trying
> to ensure whether we have synced with the apply worker or not.
>

I think the STATE_CATCHUP guarantees the apply worker must have
received (or tried to receive) a message. See the previous answer.

~~~

AFAIK this patch is OK, but since it is not particularly urgent I've
bumped this to the next CommitFest [1] instead of trying to jam it
into PG15 at the last minute.

BTW - There were some useful logfiles I captured a very long time ago
[2]. They show the behaviour without/with this patch.

------
[1] https://commitfest.postgresql.org/37/3062/
[2] https://www.postgresql.org/message-id/CAHut+Ptjk-Qgd3R1a1_tr62CmiswcYphuv0pLmVA-+2s8r0Bkw@mail.gmail.com

Kind Regards,
Peter Smith
Fujitsu Australia

Re: Tablesync early exit

From

Amit Kapila

Date:

02 April 2022, 06:17:14

On Fri, Apr 1, 2022 at 1:52 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Wed, Mar 16, 2022 at 4:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> I think the STATE_CATCHUP guarantees the apply worker must have
> received (or tried to receive) a message. See the previous answer.
>

Sorry, I intend to say till the sync worker has received any message.
The point is that LSN till where the copy has finished might actually
be later than some of the in-progress transactions on the server. It
may not be a good idea to blindly skip those changes if the apply
worker has already received those changes (say via a 'streaming'
mode). Today, all such changes would be written to the file and
applied at commit time but tomorrow, we can have an implementation
where we can apply such changes (via some background worker) by
skipping changes related to the table for which the table-sync worker
is in-progress. Now, in such a scenario, unless, we allow the table
sync worker to process more messages, we will end up losing some
changes for that particular table.

As per my understanding, this is safe as per the current code but it
can't be guaranteed for future implementations and the amount of extra
work is additional work to receive the messages for one transaction. I
still don't think that it is a good idea to pursue this patch.

-- 
With Regards,
Amit Kapila.

Re: Tablesync early exit

From

Peter Smith

Date:

05 April 2022, 04:07:16

On Sat, Apr 2, 2022 at 5:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Apr 1, 2022 at 1:52 PM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > On Wed, Mar 16, 2022 at 4:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > I think the STATE_CATCHUP guarantees the apply worker must have
> > received (or tried to receive) a message. See the previous answer.
> >
>
> Sorry, I intend to say till the sync worker has received any message.
> The point is that LSN till where the copy has finished might actually
> be later than some of the in-progress transactions on the server. It
> may not be a good idea to blindly skip those changes if the apply
> worker has already received those changes (say via a 'streaming'
> mode). Today, all such changes would be written to the file and
> applied at commit time but tomorrow, we can have an implementation
> where we can apply such changes (via some background worker) by
> skipping changes related to the table for which the table-sync worker
> is in-progress. Now, in such a scenario, unless, we allow the table
> sync worker to process more messages, we will end up losing some
> changes for that particular table.
>
> As per my understanding, this is safe as per the current code but it
> can't be guaranteed for future implementations and the amount of extra
> work is additional work to receive the messages for one transaction. I
> still don't think that it is a good idea to pursue this patch.

IIUC you are saying that my patch is good today, but it may cause
problems in a hypothetical future if the rest of the replication logic
is implemented differently.

Anyway, it seems there is no chance of this getting committed, so it
is time for me to stop flogging this dead horse.

I will remove this from the CF.

------
Kind Regards,
Peter Smith
Fujitsu Australia

Re: Tablesync early exit

From

Amit Kapila

Date:

05 April 2022, 05:55:32

On Tue, Apr 5, 2022 at 9:37 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Sat, Apr 2, 2022 at 5:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Apr 1, 2022 at 1:52 PM Peter Smith <smithpb2250@gmail.com> wrote:
> > >
> > > On Wed, Mar 16, 2022 at 4:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > I think the STATE_CATCHUP guarantees the apply worker must have
> > > received (or tried to receive) a message. See the previous answer.
> > >
> >
> > Sorry, I intend to say till the sync worker has received any message.
> > The point is that LSN till where the copy has finished might actually
> > be later than some of the in-progress transactions on the server. It
> > may not be a good idea to blindly skip those changes if the apply
> > worker has already received those changes (say via a 'streaming'
> > mode). Today, all such changes would be written to the file and
> > applied at commit time but tomorrow, we can have an implementation
> > where we can apply such changes (via some background worker) by
> > skipping changes related to the table for which the table-sync worker
> > is in-progress. Now, in such a scenario, unless, we allow the table
> > sync worker to process more messages, we will end up losing some
> > changes for that particular table.
> >
> > As per my understanding, this is safe as per the current code but it
> > can't be guaranteed for future implementations and the amount of extra
> > work is additional work to receive the messages for one transaction. I
> > still don't think that it is a good idea to pursue this patch.
>
> IIUC you are saying that my patch is good today, but it may cause
> problems in a hypothetical future if the rest of the replication logic
> is implemented differently.
>

The approach I have alluded to above is already proposed earlier on
-hackers [1] to make streaming transactions perform better. So, it is
not completely hypothetical.

[1] - https://www.postgresql.org/message-id/8eda5118-2dd0-79a1-4fe9-eec7e334de17%40postgrespro.ru

-- 
With Regards,
Amit Kapila.