Thread: Re: doc: Mention clock synchronization recommendation for hot_standby_feedback
Re: doc: Mention clock synchronization recommendation for hot_standby_feedback
From
Amit Kapila
Date:
On Thu, Dec 5, 2024 at 3:14 PM Jakub Wartak <jakub.wartak@enterprisedb.com> wrote: > > One of our customers ran into a very odd case, where hot standby feedback backend_xmin propagation stopped working dueto major (hours/days) clock time shifts on hypervisor-managed VMs. This happens (and is fully reproducible) e.g. in scenarioswhere standby connects and its own VM is having time from the future (relative to primary) and then that time goesback to "normal". In such situation "sends hot_standby_feedback xmin" timestamp messages are stopped being transferred,e.g.: > > 2024-12-05 02:02:35 UTC [6002]: db=,user=,app=,client= DEBUG: sending hot standby feedback xmin 1614031 epoch 0 catalog_xmin0 catalog_xmin_epoch 0 > 2024-12-05 02:02:45 UTC [6002]: db=,user=,app=,client= DEBUG: sending write 6/E9015230 flush 6/E9015230 apply 6/E9015230 > 2024-12-05 02:02:45 UTC [6002]: db=,user=,app=,client= DEBUG: sending hot standby feedback xmin 1614031 epoch 0 catalog_xmin0 catalog_xmin_epoch 0 > <-- clock readjustment and no further "sending hot standby feedback" ... > > I can share reproduction steps if anyone is interested. This basically happens due to usage of TimestampDifferenceExceeds()in XLogWalRcvSendHSFeedback(), but I bet there are other similiar scenarios. > We started to use a different mechanism in HEAD. See XLogWalRcvSendHSFeedback(). > What I was kind of surprised about was the lack of recommendation for having primary/standby to have clocks synced whenusing hot_standby_feedback, but such a thing is mentioned for recovery_min_apply_delay. So I would like to add at leastone sentence to hot_standby_feedback to warn about this too, patch attached. > IIUC, this issue doesn't occur because the primary and standby clocks are not synchronized. It happened because the clock on standby moved backward. This is quite unlike the 'recovery_min_apply_delay' where non-synchronization of clocks between primary and standby can lead to unexpected results. This is because we don't compare any time on the primary with the time on standby. If this understanding is correct then the wording proposed by your patch should be changed accordingly. -- With Regards, Amit Kapila.
Re: doc: Mention clock synchronization recommendation for hot_standby_feedback
From
Jakub Wartak
Date:
On Wed, Dec 18, 2024 at 10:33 AM Amit Kapila <amit.kapila16@gmail.com> wrote: Hi Amit! > On Thu, Dec 5, 2024 at 3:14 PM Jakub Wartak > <jakub.wartak@enterprisedb.com> wrote: > > > > One of our customers ran into a very odd case, where hot standby feedback backend_xmin propagation stopped working dueto major (hours/days) clock time shifts on hypervisor-managed VMs. This happens (and is fully reproducible) e.g. in scenarioswhere standby connects and its own VM is having time from the future (relative to primary) and then that time goesback to "normal". In such situation "sends hot_standby_feedback xmin" timestamp messages are stopped being transferred,e.g.: > > > > 2024-12-05 02:02:35 UTC [6002]: db=,user=,app=,client= DEBUG: sending hot standby feedback xmin 1614031 epoch 0 catalog_xmin0 catalog_xmin_epoch 0 > > 2024-12-05 02:02:45 UTC [6002]: db=,user=,app=,client= DEBUG: sending write 6/E9015230 flush 6/E9015230 apply 6/E9015230 > > 2024-12-05 02:02:45 UTC [6002]: db=,user=,app=,client= DEBUG: sending hot standby feedback xmin 1614031 epoch 0 catalog_xmin0 catalog_xmin_epoch 0 > > <-- clock readjustment and no further "sending hot standby feedback" > ... > > > > I can share reproduction steps if anyone is interested. This basically happens due to usage of TimestampDifferenceExceeds()in XLogWalRcvSendHSFeedback(), but I bet there are other similiar scenarios. > > > > We started to use a different mechanism in HEAD. See XLogWalRcvSendHSFeedback(). Yes, you are correct somewhat because I was looking on REL13_STABLE, but I've taken a fresh quick look at 05a7be93558 and tested it too. Sadly, PG17 still maintains the same behavior of lack of proper backend_xmin propagation (it stops sending hot standby feedback once time on standby jumps forward). I believe this is the case because walreceiver schedules next wakeup in far far future (when the clock / now() is way ahead, see WalRcvComputeNextWakeup()), so when the clock is back to normal (resetted back -X hours/days), the next wakeup seems to be +X hours/days ahead. > > What I was kind of surprised about was the lack of recommendation for having primary/standby to have clocks synced whenusing hot_standby_feedback, but such a thing is mentioned for recovery_min_apply_delay. So I would like to add at leastone sentence to hot_standby_feedback to warn about this too, patch attached. > > > > IIUC, this issue doesn't occur because the primary and standby clocks > are not synchronized. It happened because the clock on standby moved > backward. In PG17 it would be because the clock moved way forward too much on the standby. I don't know how it happened to that customer, but it was probably done somehow by the hypervisor in that scenario (so time wasn't slewed improperly by ntpd AFAIR, edge case, I know...) > This is quite unlike the 'recovery_min_apply_delay' where > non-synchronization of clocks between primary and standby can lead to > unexpected results. This is because we don't compare any time on the > primary with the time on standby. If this understanding is correct > then the wording proposed by your patch should be changed accordingly. .. if my understanding is correct, it is both depending on version :^) I was thinking about backpatching docs (of what is the recommended policy here? to just update new-release docs?), so I'm proposing something more generic than earlier, but it takes Your point into account - would something like below be good enough? - <para> - Using this option requires the primary and standby(s) to have system - clocks synchronized, otherwise it may lead to prolonged risk of not - removing dead rows on primary for extended periods of time as the - feedback mechanism is based on timestamps exchanged between primary - and standby(s). - </para> + <para> + Using this option requires the primary and standby(s) to have system + clocks synchronized (without big time jumps), otherwise it may lead to + prolonged risk of not removing dead rows on primary for extended periods + of time as the feedback mechanism implementation is timestamp based. + </para> -J.
Re: doc: Mention clock synchronization recommendation for hot_standby_feedback
From
Amit Kapila
Date:
On Wed, Jan 8, 2025 at 6:20 PM Jakub Wartak <jakub.wartak@enterprisedb.com> wrote: > > On Wed, Dec 18, 2024 at 10:33 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > Hi Amit! > > > On Thu, Dec 5, 2024 at 3:14 PM Jakub Wartak > > <jakub.wartak@enterprisedb.com> wrote: > > > > > > One of our customers ran into a very odd case, where hot standby feedback backend_xmin propagation stopped workingdue to major (hours/days) clock time shifts on hypervisor-managed VMs. This happens (and is fully reproducible) e.g.in scenarios where standby connects and its own VM is having time from the future (relative to primary) and then thattime goes back to "normal". In such situation "sends hot_standby_feedback xmin" timestamp messages are stopped beingtransferred, e.g.: > > > > > > 2024-12-05 02:02:35 UTC [6002]: db=,user=,app=,client= DEBUG: sending hot standby feedback xmin 1614031 epoch 0 catalog_xmin0 catalog_xmin_epoch 0 > > > 2024-12-05 02:02:45 UTC [6002]: db=,user=,app=,client= DEBUG: sending write 6/E9015230 flush 6/E9015230 apply 6/E9015230 > > > 2024-12-05 02:02:45 UTC [6002]: db=,user=,app=,client= DEBUG: sending hot standby feedback xmin 1614031 epoch 0 catalog_xmin0 catalog_xmin_epoch 0 > > > <-- clock readjustment and no further "sending hot standby feedback" > > ... > > > > > > I can share reproduction steps if anyone is interested. This basically happens due to usage of TimestampDifferenceExceeds()in XLogWalRcvSendHSFeedback(), but I bet there are other similiar scenarios. > > > > > > > We started to use a different mechanism in HEAD. See XLogWalRcvSendHSFeedback(). > > Yes, you are correct somewhat because I was looking on REL13_STABLE, > but I've taken a fresh quick look at 05a7be93558 and tested it too. > Sadly, PG17 still maintains the same behavior of lack of proper > backend_xmin propagation (it stops sending hot standby feedback once > time on standby jumps forward). I believe this is the case because > walreceiver schedules next wakeup in far far future (when the clock / > now() is way ahead, see WalRcvComputeNextWakeup()), so when the clock > is back to normal (resetted back -X hours/days), the next wakeup seems > to be +X hours/days ahead. > > > > What I was kind of surprised about was the lack of recommendation for having primary/standby to have clocks syncedwhen using hot_standby_feedback, but such a thing is mentioned for recovery_min_apply_delay. So I would like to addat least one sentence to hot_standby_feedback to warn about this too, patch attached. > > > > > > > IIUC, this issue doesn't occur because the primary and standby clocks > > are not synchronized. It happened because the clock on standby moved > > backward. > > In PG17 it would be because the clock moved way forward too much on > the standby. I don't know how it happened to that customer, but it was > probably done somehow by the hypervisor in that scenario (so time > wasn't slewed improperly by ntpd AFAIR, edge case, I know...) > > > This is quite unlike the 'recovery_min_apply_delay' where > > non-synchronization of clocks between primary and standby can lead to > > unexpected results. This is because we don't compare any time on the > > primary with the time on standby. If this understanding is correct > > then the wording proposed by your patch should be changed accordingly. > > .. if my understanding is correct, it is both depending on version :^) > AFAICS, it doesn't depend on the version. I checked the code of PG13, and it uses a similar implementation. I am referring to the below code in PG13: if (!immed) { /* * Send feedback at most once per wal_receiver_status_interval. */ if (!TimestampDifferenceExceeds(sendTime, now, wal_receiver_status_interval * 1000)) return; sendTime = now; } > I was thinking about backpatching docs (of what is the recommended > policy here? to just update new-release docs?), so I'm proposing > something more generic than earlier, but it takes Your point into > account - would something like below be good enough? > > - <para> > - Using this option requires the primary and standby(s) to have system > - clocks synchronized, otherwise it may lead to prolonged risk of not > - removing dead rows on primary for extended periods of time as the > - feedback mechanism is based on timestamps exchanged between primary > - and standby(s). > - </para> > > + <para> > + Using this option requires the primary and standby(s) to have system > + clocks synchronized (without big time jumps), otherwise it may lead to > + prolonged risk of not removing dead rows on primary for > extended periods > + of time as the feedback mechanism implementation is timestamp based. > + </para> > How about something like: "Note that if the clock on standby is moved ahead or backward, the feedback message may not be sent at the required interval. This can lead to prolonged risk of not removing dead rows on primary for extended periods as the feedback mechanism is based on timestamp." -- With Regards, Amit Kapila.
Re: doc: Mention clock synchronization recommendation for hot_standby_feedback
From
Jakub Wartak
Date:
Hi Amit, On Mon, Mar 3, 2025 at 6:26 AM Amit Kapila <amit.kapila16@gmail.com> wrote: [..] OK, sure. > How about something like: "Note that if the clock on standby is moved > ahead or backward, the feedback message may not be sent at the > required interval. This can lead to prolonged risk of not removing > dead rows on primary for extended periods as the feedback mechanism is > based on timestamp." Sure thing. I've just added '(..) In the extreme cases this can..' as it is pretty rare to hit it. Patch attached. -J.
Attachment
Re: doc: Mention clock synchronization recommendation for hot_standby_feedback
From
Fujii Masao
Date:
On 2025/03/03 16:35, Jakub Wartak wrote: > Hi Amit, > > On Mon, Mar 3, 2025 at 6:26 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > [..] > > OK, sure. > >> How about something like: "Note that if the clock on standby is moved >> ahead or backward, the feedback message may not be sent at the >> required interval. This can lead to prolonged risk of not removing >> dead rows on primary for extended periods as the feedback mechanism is >> based on timestamp." > > Sure thing. I've just added '(..) In the extreme cases this can..' as > it is pretty rare to hit it. Patch attached. When the clock moves forward or backward, couldn't it affect not only the standby but also the primary? I’m wondering because TimestampDifferenceExceeds() seems to be used in several places in addition to hot standby feedback. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: doc: Mention clock synchronization recommendation for hot_standby_feedback
From
Amit Kapila
Date:
On Mon, Mar 3, 2025 at 3:18 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > On 2025/03/03 16:35, Jakub Wartak wrote: > > Hi Amit, > > > > On Mon, Mar 3, 2025 at 6:26 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > [..] > > > > OK, sure. > > > >> How about something like: "Note that if the clock on standby is moved > >> ahead or backward, the feedback message may not be sent at the > >> required interval. This can lead to prolonged risk of not removing > >> dead rows on primary for extended periods as the feedback mechanism is > >> based on timestamp." > > > > Sure thing. I've just added '(..) In the extreme cases this can..' as > > it is pretty rare to hit it. Patch attached. > > When the clock moves forward or backward, couldn't it affect > not only the standby but also the primary? I’m wondering > because TimestampDifferenceExceeds() seems to be used > in several places in addition to hot standby feedback. > Right, it could impact other places as well, like background WAL flush being delayed. So, what should we do about this? Shall we leave this as is, make a general statement, find all cases and make a note about them in docs, do it for the important ones where the impact is more, or something else? -- With Regards, Amit Kapila.