Re: WIP: Failover Slots - Mailing list pgsql-hackers
From | Craig Ringer |
---|---|
Subject | Re: WIP: Failover Slots |
Date | |
Msg-id | CAMsr+YEVW-_wivY8yJQCTiJ20u7V4P_xNWNx55MaJ70mLS7O6g@mail.gmail.com Whole thread Raw |
In response to | Re: WIP: Failover Slots (Craig Ringer <craig@2ndquadrant.com>) |
Responses |
Re: WIP: Failover Slots
|
List | pgsql-hackers |
<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">A few thoughts on failover slots vs the alternative of pushingcatalog_xmin up to the master via a replica's slot and creating independent slots on replicas.</div><div class="gmail_quote"><br/></div><div class="gmail_quote"><br /></div><div class="gmail_quote">Failover slots:</div><div class="gmail_quote">---</div><divclass="gmail_quote"><br /></div><div class="gmail_quote">+ Failover slots are very easyfor applications. They "just work" and are transparent for failover. This is great especially for things that aren'tcomplex replication schemes, that just want to use logical decoding.</div><div class="gmail_quote"><br /></div><divclass="gmail_quote">+ Applications don't have to know what replicas exist or be able to reach them; transparentfailover is easier.</div><div class="gmail_quote"><br /></div><div class="gmail_quote">- Failover slots can'tbe used from a cascading standby (where we can fail down to the standby's own replicas) because they have to write WALto advance the slot position. They'd have to send the slot position update "up" to the master then wait to replay it.Not a disaster, though they'd do extra work on reconnect until a restart_lsn update replayed. Would require a whole newfeedback-like message on the rep protocol, and couldn't work at all with archive replication. Ugly as hell.</div><divclass="gmail_quote"><br /></div><div class="gmail_quote">+ Failover slots exist now, and could be added to9.6.</div><div class="gmail_quote"><br /></div><div class="gmail_quote">- The UI for failover slots can't be re-used forthe catalog_xmin push-up approach to allow replay from failover slots on cascading standbys in 9.7+. There'd be no wayto propagate the creation of failover slots "down" the replication heirarchy that way, especially to archive standbyslike failover slots will do. So it'd be semantically different and couldn't re-use the FS UI. We'd be stuck withfailover slots even if we also did the other way later.</div><div class="gmail_quote"><br /></div><div class="gmail_quote">+Will work for recovery of a master PITR-restored up to the latest recovery point</div><div class="gmail_quote"><br/></div><div class="gmail_quote"><br /></div><div class="gmail_quote"><br /></div><div class="gmail_quote">Independentslots on replicas + catalog_xmin push-up</div><div class="gmail_quote">---</div><div class="gmail_quote"><br/></div><div class="gmail_quote">With this approach we allow creation of replication slots on a replicaindependently of the master. The replica is required to connect to the master via a slot. We send feedback to themaster to advance the replica's slot on the master to the confirmed_lsn of the most-behind slot on the replica, thereforepinning master's catalog_xmin where needed. Or we just send a new feedback message type that directly sets a catalog_xminon the replica's physical slot in the master. Slots are _not_ cloned from master to replica automatically.</div><divclass="gmail_quote"><br /></div><div class="gmail_quote"><br /></div><div class="gmail_quote">- Morecomplicated for applications to use. They have to create a slot on each replica that might be failed over to as wellas the master and have to advance all those slots to stop the master from suffering severe catalog bloat. (But see notebelow).</div><div class="gmail_quote"><br /></div><div class="gmail_quote">- Applications must be able to connect tofailover-candidate standbys and know where they are, it's not automagically handled via WAL. (But see note below).</div><divclass="gmail_quote"><br /></div><div class="gmail_quote">- Applications need reconfiguration whenever astandby is rebuilt, moved, etc. (But see note below).</div><div class="gmail_quote"><br /></div><div class="gmail_quote">-Cannot work at all for archive-based replication, requires a slot from replica to master.</div><divclass="gmail_quote"><br /></div><div class="gmail_quote">+ Works with replay from cascading standbys</div><divclass="gmail_quote"><br /></div><div class="gmail_quote">+ Actually solves one of the problems making logicalslots on standbys unsupported at the moment by giving us a way to pin the master's catalog_xmin to that needed bya replica.</div><div class="gmail_quote"><br /></div><div class="gmail_quote">- Won't work for a standby PITR-restoredup to latest.</div><div class="gmail_quote"><br /></div><div class="gmail_quote">- Vapourware with zero hopefor 9.6</div><div class="gmail_quote"><br /></div><div class="gmail_quote"><br /></div><div class="gmail_quote">Note:I think the application complexity issues can be solved - to a degree - by having the replicas runa bgworker based helper that connects to the master and clones the master's slots then advances them automatically.</div><divclass="gmail_quote"><br /></div><div class="gmail_quote"><br /></div><div class="gmail_quote">Donothing</div><div class="gmail_quote">---</div><div class="gmail_quote"><br /></div><div class="gmail_quote">Dropthe idea of being able to follow physical failover on logical slots.</div><div class="gmail_quote"><br/></div><div class="gmail_quote">I've already expressed why I think this is a terrible idea. It'shostile to application developers who'd like to use logical decoding. It makes integration of logical replication withexisting HA systems much harder. It means we need really solid, performant, well-tested and mature logical rep basedHA before we can take logical rep seriously, which is a long way out given that we can't do decoding of in-progressxacts, ddl, sequences, .... etc etc.</div><div class="gmail_quote"><br /></div><div class="gmail_quote">Some kindof physical HA for logical slots is needed and will be needed for some time. Logical rep will be great for selectivereplication, replication over WAN, filtered/transformed replication etc. Physical rep is great for knowing you'llget exactly the same thing on the replica that you have on the master and it'll Just Work.</div><div class="gmail_quote"><br/></div><div class="gmail_quote">In any case, "Do nothing" is the same for 9.6 as pursusing the catalog_xminpush-up idea; in both cases we don't commit anything in 9.6.</div><div class="gmail_quote"><br /></div><div class="gmail_quote"><br/></div><div class="gmail_quote"><br /></div></div></div>
pgsql-hackers by date: