Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers

From shveta malik
Subject Re: Synchronizing slots from primary to standby
Date
Msg-id CAJpy0uA3=MReUZV2EpsF+mRe=mcu-hhx=kk_tWa4t6CKDui2eg@mail.gmail.com
Whole thread Raw
In response to Re: Synchronizing slots from primary to standby  (Ajin Cherian <itsajin@gmail.com>)
Responses Re: Synchronizing slots from primary to standby
List pgsql-hackers
On Thu, Nov 30, 2023 at 5:37 PM Ajin Cherian <itsajin@gmail.com> wrote:
>
> On Wed, Nov 29, 2023 at 8:17 PM Zhijie Hou (Fujitsu)
> <houzj.fnst@fujitsu.com> wrote:
> >
> > This has been fixed.
> >
> > Best Regards,
> > Hou zj
>
> Thanks for addressing my comments. Some comments from my testing of patch v41
>
> 1. In my opinion, the second message "aborting the wait...moving to
> the next slot" does not hold much value. There might not even be a
> "next slot", there might be just one slot. I think the first LOG is
> enough to indicate that the sync-slot is waiting as it repeats this
> log till the slot catches up. I know these messages hold great value
> for debugging but in production,  "waiting..", "aborting the wait.."
> might not be as helpful, maybe change it to debug?
>
> 2023-11-30 05:13:49.811 EST [6115] LOG:  waiting for remote slot
> "sub1" LSN (0/3047A90) and catalog xmin (745) to pass local slot LSN
> (0/3047AC8) and catalog xmin (745)
> 2023-11-30 05:13:57.909 EST [6115] LOG:  aborting the wait for remote
> slot "sub1" and moving to the next slot, will attempt creating it
> again
> 2023-11-30 05:14:07.921 EST [6115] LOG:  waiting for remote slot
> "sub1" LSN (0/3047A90) and catalog xmin (745) to pass local slot LSN
> (0/3047AC8) and catalog xmin (745)
>

Sure, the message can be trimmed down. But I am not very sure if we
should convert it to DEBUG. It might be useful to know what exactly is
happening with this slot through the log file.Curious to know what
others think here?

>
> 2. If a slot on the standby is in the "i" state as it hasn't been
> synced and it was invalidated on the primary, should you continuously
> retry creating this invalidated slot on the standby?
>
> 2023-11-30 06:21:41.844 EST [10563] LOG:  waiting for remote slot
> "sub1" LSN (0/0) and catalog xmin (785) to pass local slot LSN
> (0/EED9330) and catalog xmin (785)
> 2023-11-30 06:21:41.845 EST [10563] WARNING:  slot "sub1" invalidated
> on the primary server, slot creation aborted
> 2023-11-30 06:21:51.892 EST [10563] LOG:  waiting for remote slot
> "sub1" LSN (0/0) and catalog xmin (785) to pass local slot LSN
> (0/EED9330) and catalog xmin (785)
> 2023-11-30 06:21:51.893 EST [10563] WARNING:  slot "sub1" invalidated
> on the primary server, slot creation aborted
>

No, it should not be synced after that. It should be marked as
invalidated and skipped. And perhaps the state should also be moved to
'r' as we are done with it; but since it is invalidated, it will not
be used further even if 'r'.

> 3. If creation of a slot on the standby fails for one slot because a
> slot of the same name exists, then thereafter no new sync slots are
> created on standby. Is this expected? I do see that previously created
> slots are kept up to date, just that no new slots are created after
> that.
>

yes, it is done so as per the suggestion/discussion in [1]. It is done
so that users can catch this issue at the earliest.

[1]: https://www.postgresql.org/message-id/CAA4eK1J5D-Z7dFa89acf7O%2BCa6Y9bygTpi52KAKVCg%2BPE%2BZfog%40mail.gmail.com

thanks
Shveta



pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: optimize atomic exchanges
Next
From: Andres Freund
Date:
Subject: Re: optimize atomic exchanges