Re: Time delayed LR (WAS Re: logical replication restrictions) - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Time delayed LR (WAS Re: logical replication restrictions) |
Date | |
Msg-id | CAA4eK1+2tbyjk12rpG=OYYUdhiBauHXJ-u1WZtpQ2viMCPBMdA@mail.gmail.com Whole thread Raw |
In response to | RE: Time delayed LR (WAS Re: logical replication restrictions) ("Takamichi Osumi (Fujitsu)" <osumi.takamichi@fujitsu.com>) |
Responses |
RE: Time delayed LR (WAS Re: logical replication restrictions)
RE: Time delayed LR (WAS Re: logical replication restrictions) |
List | pgsql-hackers |
On Tue, Dec 6, 2022 at 5:44 PM Takamichi Osumi (Fujitsu) <osumi.takamichi@fujitsu.com> wrote: > > On Friday, December 2, 2022 4:05 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Nov 15, 2022 at 12:33 PM Amit Kapila <amit.kapila16@gmail.com> > > wrote: > > > One more thing I would like you to consider is the point raised by me > > > related to this patch's interaction with the parallel apply feature as > > > mentioned in the email [1]. I am not sure the idea proposed in that > > > email [1] is a good one because delaying after applying commit may not > > > be good as we want to delay the apply of the transaction(s) on > > > subscribers by this feature. I feel this needs more thought. > > > > > > > I have thought a bit more about this and we have the following options to > > choose the delay point from. (a) apply delay just before committing a > > transaction. As mentioned in comments in the patch this can lead to bloat and > > locks held for a long time. (b) apply delay before starting to apply changes for a > > transaction but here the problem is which time to consider. In some cases, like > > for streaming transactions, we don't receive the commit/prepare xact time in > > the start message. (c) use (b) but use the previous transaction's commit time. > > (d) apply delay after committing a transaction by using the xact's commit time. > > > > At this stage, among above, I feel any one of (c) or (d) is worth considering. Now, > > the difference between (c) and (d) is that if after commit the next xact's data is > > already delayed by more than min_apply_delay time then we don't need to kick > > the new logic of apply delay. > > > > The other thing to consider whether we need to process any keepalive > > messages during the delay because otherwise, walsender may think that the > > subscriber is not available and time out. This may not be a problem for > > synchronous replication but otherwise, it could be a problem. > > > > Thoughts? > Hi, > > > Thank you for your comments ! > Below are some analysis for the major points above. > > (1) About the timing to apply the delay > > One approach of (b) would be best. The idea is to delay all types of transaction's application > based on the time when one transaction arrives at the subscriber node. > But I think it will unnecessarily add the delay when there is a delay in sending the transaction by the publisher (say due to the reason that publisher was busy handling other workloads or there was a temporary network communication break between publisher and subscriber). This could probably be the reason why physical replication (via recovery_min_apply_delay) uses the commit time of the sending side. > One advantage of this approach over (c) and (d) is that this can avoid the case > where we might apply a transaction immediately without waiting, > if there are two transactions sequentially and the time in between exceeds the min_apply_delay time. > I am not sure if I understand your point. However, I think even if the transactions are sequential but if the time between them exceeds (say because the publisher was down) min_apply_delay, there is no need to apply additional delay. > When we receive stream-in-progress transactions, we'll check whether the time for delay > has passed or not at first in this approach. > > > (2) About the timeout issue > > When having a look at the physical replication internals, > it conducts sending feedback and application of delay separately on different processes. > OTOH, the logical replication needs to achieve those within one process. > > When we want to apply delay and avoid the timeout, > we should not store all the transactions data into memory. > So, one approach for this is to serialize the transaction data and after the delay, > we apply the transactions data. > It is not clear to me how this will avoid a timeout. > However, this means if users adopt this feature, > then all transaction data that should be delayed would be serialized. > We are not sure if this sounds a valid approach or not. > > One another approach was to divide the time of delay in apply_delay() and > utilize the divided time for WaitLatch and sends the keepalive messages from there. > Do we anytime send keepalive messages from the apply side? I think we only send feedback reply messages as a response to the publisher's keep_alive message. So, we need to do something similar for this if you want to follow this approach. -- With Regards, Amit Kapila.
pgsql-hackers by date: