Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication - Mailing list pgsql-hackers
| From | Ashutosh Sharma |
|---|---|
| Subject | Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication |
| Date | |
| Msg-id | CAE9k0PnOUth5tjT21wD75QRUsREQ35=z9JgqOFVUdCLrQ62s3g@mail.gmail.com Whole thread |
| In response to | Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication (shveta malik <shveta.malik@gmail.com>) |
| Responses |
Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication
|
| List | pgsql-hackers |
Hi, On Thu, Feb 26, 2026 at 2:15 PM shveta malik <shveta.malik@gmail.com> wrote: > > On Thu, Feb 26, 2026 at 1:54 PM SATYANARAYANA NARLAPURAM > <satyanarlapuram@gmail.com> wrote: > > > > Hi Ashutosh, > > > > On Wed, Feb 25, 2026 at 11:42 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > >> > >> > >> I don't think we should be comparing "synchronous_standby_names" with > >> "synchronized_standby_slots", even though they appear similar in > >> purpose. All values listed in synchronous_standby_names represent > >> synchronous standbys exclusively, whereas synchronized_standby_slots > >> can hold values for both synchronous and asynchronous standbys. In > >> other words, every server referenced by synchronous_standby_names is > >> of the same type, but that may not be the case with > >> synchronized_standby_slots. > >> > >> If a GUC can hold values of different types (sync vs. async), does it > >> really make sense to use a qualifier like ANY 1 (val1, val2) when val1 > >> and val2 are different in nature? For example, suppose val1 is a > >> synchronous standby and val2 is an asynchronous standby, and we > >> configure ANY 1 (val1, val2). It's possible for val2 to get ahead of > >> val1 in terms of replication progress, which in turn could mean the > >> logical replica is also ahead of val1. So if we were to fail over to > >> val1 (since it's the only synchronous standby), we will not be able to > >> use the existing logical replication setup. > > > > > > If the failover orchestrator cannot ensure standby1 to not get the quorum committed WAL (from archive or standby2) thenthe setting ANY 1 (val1, val2) is invalid. > > This setup also has issues because in your scenario, standby2 is ahead of the new primary (standby1) and standby2 requiresnow to rewind to be in sync with the new primary. Additionally, it allowed readers to read data that was lost atthe end of the failover. We ideally need a mechanism to not send WAL to async replicas before the sync replicas commit (honoring syncrhnous_standby_names GUC) feature (similar to synchronized_standby_slots). It could be a different threadon its own. > > > +1 on the overall idea of the patch. > I understand the concern raised above that one of the standbys in the > quorum (synchronized_standby_slots) might lag behind the logical > replica, and a user could potentially failover to such a standby. But > I also agree with Amit that configuring failover correctly is > ultimately the responsibility of failover-solution. And instructions > in doc should be followed before deciding if a standby is > failover-ready or not. > > As suggested in [1], IMO, it is a reasonably good idea for > 'synchronized_standby_slots' to DEFAULT to the value of > 'synchronous_standby_names'. That way, even if the user missed to > configure 'synchronized_standby_slots' explicitly, we would still have > reasonable protection in place. At the same time, if a user > intentionally chooses not to configure it, a NULL/NONE value should > remain a valid option. > AFAIU, not all names listed in "synchronous_standby_names" are necessarily synchronous standbys. Tools like pg_receivewal, for example, can establish a replication connection to the primary and appear in that list. Therefore, deriving "synchronized_standby_slots" from "synchronous_standby_names", if not set by the user would cause logical slots to be synchronized to whatever nodes those names represent, including a host running pg_receivewal, which is certainly not something the user would have intended to do. Therefore I feel this might not just be the good choice. -- With Regards, Ashutosh Sharma.
pgsql-hackers by date: