Re: WIP: Failover Slots - Mailing list pgsql-hackers

From Craig Ringer
Subject Re: WIP: Failover Slots
Date
Msg-id CAMsr+YEVW-_wivY8yJQCTiJ20u7V4P_xNWNx55MaJ70mLS7O6g@mail.gmail.com
Whole thread Raw
In response to Re: WIP: Failover Slots  (Craig Ringer <craig@2ndquadrant.com>)
Responses Re: WIP: Failover Slots  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">A few thoughts on failover slots vs the alternative of
pushingcatalog_xmin up to the master via a replica's slot and creating independent slots on replicas.</div><div
class="gmail_quote"><br/></div><div class="gmail_quote"><br /></div><div class="gmail_quote">Failover slots:</div><div
class="gmail_quote">---</div><divclass="gmail_quote"><br /></div><div class="gmail_quote">+ Failover slots are very
easyfor applications. They "just work" and are transparent for failover. This is great especially for things that
aren'tcomplex replication schemes, that just want to use logical decoding.</div><div class="gmail_quote"><br
/></div><divclass="gmail_quote">+ Applications don't have to know what replicas exist or be able to reach them;
transparentfailover is easier.</div><div class="gmail_quote"><br /></div><div class="gmail_quote">- Failover slots
can'tbe used from a cascading standby (where we can fail down to the standby's own replicas) because they have to write
WALto advance the slot position. They'd have to send the slot position update "up" to the master then wait to replay
it.Not a disaster, though they'd do extra work on reconnect until a restart_lsn update replayed. Would require a whole
newfeedback-like message on the rep protocol, and couldn't work at all with archive replication. Ugly as
hell.</div><divclass="gmail_quote"><br /></div><div class="gmail_quote">+ Failover slots exist now, and could be added
to9.6.</div><div class="gmail_quote"><br /></div><div class="gmail_quote">- The UI for failover slots can't be re-used
forthe catalog_xmin push-up approach to allow replay from failover slots on cascading standbys in 9.7+. There'd be no
wayto propagate the creation of failover slots "down" the replication heirarchy that way, especially to archive
standbyslike failover slots will do. So it'd be semantically different and couldn't re-use the FS UI. We'd be stuck
withfailover slots even if we also did the other way later.</div><div class="gmail_quote"><br /></div><div
class="gmail_quote">+Will work for recovery of a master PITR-restored up to the latest recovery point</div><div
class="gmail_quote"><br/></div><div class="gmail_quote"><br /></div><div class="gmail_quote"><br /></div><div
class="gmail_quote">Independentslots on replicas + catalog_xmin push-up</div><div class="gmail_quote">---</div><div
class="gmail_quote"><br/></div><div class="gmail_quote">With this approach we allow creation of replication slots on a
replicaindependently of the master. The replica is required to connect to the master via a slot. We send feedback to
themaster to advance the replica's slot on the master to the confirmed_lsn of the most-behind slot on the replica,
thereforepinning master's catalog_xmin where needed. Or we just send a new feedback message type that directly sets a
catalog_xminon the replica's physical slot in the master. Slots are _not_ cloned from master to replica
automatically.</div><divclass="gmail_quote"><br /></div><div class="gmail_quote"><br /></div><div class="gmail_quote">-
Morecomplicated for applications to use. They have to create a slot on each replica that might be failed over to as
wellas the master and have to advance all those slots to stop the master from suffering severe catalog bloat.  (But see
notebelow).</div><div class="gmail_quote"><br /></div><div class="gmail_quote">- Applications must be able to connect
tofailover-candidate standbys and know where they are, it's not automagically handled via WAL.  (But see note
below).</div><divclass="gmail_quote"><br /></div><div class="gmail_quote">- Applications need reconfiguration whenever
astandby is rebuilt, moved, etc. (But see note below).</div><div class="gmail_quote"><br /></div><div
class="gmail_quote">-Cannot work at all for archive-based replication, requires a slot from replica to
master.</div><divclass="gmail_quote"><br /></div><div class="gmail_quote">+ Works with replay from cascading
standbys</div><divclass="gmail_quote"><br /></div><div class="gmail_quote">+ Actually solves one of the problems making
logicalslots on standbys unsupported at the moment by giving us a way to pin the master's catalog_xmin to that needed
bya replica.</div><div class="gmail_quote"><br /></div><div class="gmail_quote">- Won't work for a standby
PITR-restoredup to latest.</div><div class="gmail_quote"><br /></div><div class="gmail_quote">- Vapourware with zero
hopefor 9.6</div><div class="gmail_quote"><br /></div><div class="gmail_quote"><br /></div><div
class="gmail_quote">Note:I think the application complexity issues can be solved - to a degree - by having the replicas
runa bgworker based helper that connects to the master and clones the master's slots then advances them
automatically.</div><divclass="gmail_quote"><br /></div><div class="gmail_quote"><br /></div><div
class="gmail_quote">Donothing</div><div class="gmail_quote">---</div><div class="gmail_quote"><br /></div><div
class="gmail_quote">Dropthe idea of being able to follow physical failover on logical slots.</div><div
class="gmail_quote"><br/></div><div class="gmail_quote">I've already expressed why I think this is a terrible idea.
It'shostile to application developers who'd like to use logical decoding. It makes integration of logical replication
withexisting HA systems much harder. It means we need really solid, performant, well-tested and mature logical rep
basedHA before we can take logical rep seriously, which is a long way out given that we can't do decoding of
in-progressxacts, ddl, sequences, .... etc etc.</div><div class="gmail_quote"><br /></div><div class="gmail_quote">Some
kindof physical HA for logical slots is needed and will be needed for some time. Logical rep will be great for
selectivereplication, replication over WAN, filtered/transformed replication etc. Physical rep is great for knowing
you'llget exactly the same thing on the replica that you have on the master and it'll Just Work.</div><div
class="gmail_quote"><br/></div><div class="gmail_quote">In any case, "Do nothing" is the same for 9.6 as pursusing the
catalog_xminpush-up idea; in both cases we don't commit anything in 9.6.</div><div class="gmail_quote"><br /></div><div
class="gmail_quote"><br/></div><div class="gmail_quote"><br /></div></div></div> 

pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: [COMMITTERS] pgsql: Avoid archiving XLOG_RUNNING_XACTS on idle server
Next
From: Anastasia Lubennikova
Date:
Subject: Re: WIP: Covering + unique indexes.