Re: Time delayed LR (WAS Re: logical replication restrictions) - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Time delayed LR (WAS Re: logical replication restrictions)
Date
Msg-id CAA4eK1+2tbyjk12rpG=OYYUdhiBauHXJ-u1WZtpQ2viMCPBMdA@mail.gmail.com
Whole thread Raw
In response to RE: Time delayed LR (WAS Re: logical replication restrictions)  ("Takamichi Osumi (Fujitsu)" <osumi.takamichi@fujitsu.com>)
Responses RE: Time delayed LR (WAS Re: logical replication restrictions)  ("Takamichi Osumi (Fujitsu)" <osumi.takamichi@fujitsu.com>)
RE: Time delayed LR (WAS Re: logical replication restrictions)  ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
List pgsql-hackers
On Tue, Dec 6, 2022 at 5:44 PM Takamichi Osumi (Fujitsu)
<osumi.takamichi@fujitsu.com> wrote:
>
> On Friday, December 2, 2022 4:05 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Tue, Nov 15, 2022 at 12:33 PM Amit Kapila <amit.kapila16@gmail.com>
> > wrote:
> > > One more thing I would like you to consider is the point raised by me
> > > related to this patch's interaction with the parallel apply feature as
> > > mentioned in the email [1]. I am not sure the idea proposed in that
> > > email [1] is a good one because delaying after applying commit may not
> > > be good as we want to delay the apply of the transaction(s) on
> > > subscribers by this feature. I feel this needs more thought.
> > >
> >
> > I have thought a bit more about this and we have the following options to
> > choose the delay point from. (a) apply delay just before committing a
> > transaction. As mentioned in comments in the patch this can lead to bloat and
> > locks held for a long time. (b) apply delay before starting to apply changes for a
> > transaction but here the problem is which time to consider. In some cases, like
> > for streaming transactions, we don't receive the commit/prepare xact time in
> > the start message. (c) use (b) but use the previous transaction's commit time.
> > (d) apply delay after committing a transaction by using the xact's commit time.
> >
> > At this stage, among above, I feel any one of (c) or (d) is worth considering. Now,
> > the difference between (c) and (d) is that if after commit the next xact's data is
> > already delayed by more than min_apply_delay time then we don't need to kick
> > the new logic of apply delay.
> >
> > The other thing to consider whether we need to process any keepalive
> > messages during the delay because otherwise, walsender may think that the
> > subscriber is not available and time out. This may not be a problem for
> > synchronous replication but otherwise, it could be a problem.
> >
> > Thoughts?
> Hi,
>
>
> Thank you for your comments !
> Below are some analysis for the major points above.
>
> (1) About the timing to apply the delay
>
> One approach of (b) would be best. The idea is to delay all types of transaction's application
> based on the time when one transaction arrives at the subscriber node.
>

But I think it will unnecessarily add the delay when there is a delay
in sending the transaction by the publisher (say due to the reason
that publisher was busy handling other workloads or there was a
temporary network communication break between publisher and
subscriber). This could probably be the reason why physical
replication (via recovery_min_apply_delay) uses the commit time of the
sending side.

> One advantage of this approach over (c) and (d) is that this can avoid the case
> where we might apply a transaction immediately without waiting,
> if there are two transactions sequentially and the time in between exceeds the min_apply_delay time.
>

I am not sure if I understand your point. However, I think even if the
transactions are sequential but if the time between them exceeds (say
because the publisher was down) min_apply_delay, there is no need to
apply additional delay.

> When we receive stream-in-progress transactions, we'll check whether the time for delay
> has passed or not at first in this approach.
>
>
> (2) About the timeout issue
>
> When having a look at the physical replication internals,
> it conducts sending feedback and application of delay separately on different processes.
> OTOH, the logical replication needs to achieve those within one process.
>
> When we want to apply delay and avoid the timeout,
> we should not store all the transactions data into memory.
> So, one approach for this is to serialize the transaction data and after the delay,
> we apply the transactions data.
>

It is not clear to me how this will avoid a timeout.

> However, this means if users adopt this feature,
> then all transaction data that should be delayed would be serialized.
> We are not sure if this sounds a valid approach or not.
>
> One another approach was to divide the time of delay in apply_delay() and
> utilize the divided time for WaitLatch and sends the keepalive messages from there.
>

Do we anytime send keepalive messages from the apply side? I think we
only send feedback reply messages as a response to the publisher's
keep_alive message. So, we need to do something similar for this if
you want to follow this approach.

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Dmitry Koterov
Date:
Subject: Re: Is the plan for IN(1,2,3) always the same as for =ANY('{1,2,3}') when using PQexec with no params?
Next
From: "Takamichi Osumi (Fujitsu)"
Date:
Subject: RE: Time delayed LR (WAS Re: logical replication restrictions)