Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Synchronizing slots from primary to standby |
Date | |
Msg-id | CAD21AoBjHYK_Zhr_3jjjJzP30LdbnMg0D4VtfZZkzucEZuo3ng@mail.gmail.com Whole thread Raw |
In response to | Re: Synchronizing slots from primary to standby (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
RE: Synchronizing slots from primary to standby
|
List | pgsql-hackers |
On Tue, Feb 6, 2024 at 8:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Feb 6, 2024 at 3:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Tue, Feb 6, 2024 at 1:09 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > On Tue, Feb 6, 2024 at 3:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > I think users can refer to LOGs to see if it has changed since the > > > > first time it was configured. I tried by existing parameter and see > > > > the following in LOG: > > > > LOG: received SIGHUP, reloading configuration files > > > > 2024-02-06 11:38:59.069 IST [9240] LOG: parameter "autovacuum" changed to "on" > > > > > > > > If the user can't confirm then it is better to follow the steps > > > > mentioned in the patch. Do you want something else to be written in > > > > docs for this? If so, what? > > > > > > IIUC even if a wrong slot name is specified to standby_slot_names or > > > even standby_slot_names is empty, the standby server might not be > > > lagging behind the subscribers depending on the timing. But when > > > checking it the next time, the standby server might lag behind the > > > subscribers. So what I wanted to know is how the user can confirm if a > > > failover-enabled subscription is ensured not to go in front of > > > failover-candidate standbys (i.e., standbys using the slots listed in > > > standby_slot_names). > > > > > > > But isn't the same explained by two steps ((a) Firstly, on the > > subscriber node check the last replayed WAL. (b) Next, on the standby > > server check that the last-received WAL location is ahead of the > > replayed WAL location on the subscriber identified above.) in the > > latest *_0004 patch. > > > > Additionally, I would like to add that the users can use the queries > mentioned in the doc after the primary has failed and before promoting > the standby. If she wants to do that when both primary and standby are > available, the value of 'standby_slot_names' on primary should be > referred. Isn't those two sufficient that there won't be false > positives? From a user perspective, I'd like to confirm the following two points : 1. replication slots used by subscribers are synchronized to the standby. 2. it's guaranteed that logical replication doesn't go ahead of physical replication to the standby. These checks are necessary at least when building a replication setup (primary, standby, and subscriber). Otherwise, it's too late if we find out that no standby is failover-ready when the primary fails and we're about to do a failover. As for the point 1 above, we can use the step 1 described in the doc. As for point 2, the step 2 described in the doc could return true even if standby_slot_names isn't working. For example, standby_slot_names is empty, the user changed the standby_slot_names but forgot to reload the config file, and the walsender doesn't reflect the standby_slot_names update yet for some reason etc. It's possible that standby's last-received WAL location just happens to be ahead of the replayed WAL location on the subscriber. So even if the check query returns true once, it could return false when we check it again, if standby_slot_names is not working. On the other hand, IIUC if the point 2 is ensured, the check query always returns true. I think it would be good if we could provide a reliable way to check point 2 ideally via SQL queries (especially for tools). Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: