Re: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers - Mailing list pgsql-hackers
From | Jeff Davis |
---|---|
Subject | Re: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers |
Date | |
Msg-id | 04a1555a8b59e4f6fdf1df63e30fef4be3e2336c.camel@j-davis.com Whole thread Raw |
In response to | Re: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers
|
List | pgsql-hackers |
On Fri, 2022-01-07 at 14:54 -0800, Andres Freund wrote: > > If you only promote the furthest-ahead sync replica (which is what > > you > > should be doing if you have quorum commit), why wouldn't that work? > > Remove "sync" from the above sentence, and the sentence holds true > for > combinations of sync/async replicas as well. Technically that's true, but it seems like a bit of a strange use case. I would think people doing that would just include those async replicas in the sync quorum instead. The main case I can think of for a mix of sync and async replicas are if they are just managed differently. For instance, the sync replica quorum is managed for a core part of the system, strategically allocated on good hardware in different locations to minimize the chance of dependent failures; while the async read replicas are optional for taking load off the primary, and may appear/disappear in whatever location and on whatever hardware is most convenient. But if an async replica can get ahead of the sync rep quorum, then the most recent transactions can appear in query results, so that means the WAL shouldn't be lost, and the async read replicas become a part of the durability model. If the async read replica can't be promoted because it's not suitable (due to location, hardware, whatever), then you need to frantically copy the final WAL records out to an instance in the sync rep quorum. That requires extra ceremony for every failover, and might be dubious depending on how safe the WAL on your async read replicas is, and whether there are dependent failure risks. Yeah, I guess there could be some use case woven amongst those caveats, but I'm not sure if anyone is actually doing that combination of things safely today. If someone is, it would be interesting to know more about that use case. The proposal in this thread is quite a bit simpler: manage your sync quorum and your async read replicas separately, and keep the sync rep quorum ahead. > > > To me this just sounds like trying to shoehorn something into > > > syncrep > > > that > > > it's not made for. > > > > What *is* sync rep made for? This was a sincere question and an answer would be helpful. I think many of the discussions about sync rep get derailed because people have different ideas about when and how it should be used, and the documentation is pretty light. > This is a especially relevant in cases where synchronous_commit=on vs > local is > used selectively That's an interesting point. However, it's hard for me to reason about "kinda durable" and "a little more durable" and I'm not sure how many people would care about that distinction. > I don't see that. This presumes that WAL replicated to async replicas > is > somehow bad. Simple case: primary and async read replica are in the same server rack. Sync replicas are geographically distributed with quorum commit. Read replica gets the WAL first (because it's closest), starts answering queries that include that WAL, and then the entire rack catches fire. Now you've returned results to the client, but lost the transactions. Regards, Jeff Davis
pgsql-hackers by date: