PostgreSQL super HA (High Availability) conception for 9.5+ - Mailing list pgsql-hackers

From Maeldron T.
Subject PostgreSQL super HA (High Availability) conception for 9.5+
Date
Msg-id CAKatfSndnj9THRo6iaqXU2H1Ej3n_RzQ6G-o1OYexUEhUkm5HQ@mail.gmail.com
Whole thread Raw
List pgsql-hackers
<div dir="ltr"><p class=""><span class=""><font face="arial, helvetica, sans-serif">Hello,</font></span><p
class=""><fontface="arial, helvetica, sans-serif">Foreword:<br /></font><p class=""><font face="arial, helvetica,
sans-serif">Unfortunately,I have no time to read the mailing lists and attend events like PostgreSQL and NoSQL. Some of
theideas came from MongoDB and Cassandra. The inspiration was the pg_rewind.<br /><span class=""></span></font><p
class=""><spanstyle="font-family:arial,helvetica,sans-serif">There is little new here, it’s a wish-list put together,
consideringwhat could be possible in the foreseeable future. It’s likely that people worked on a similar or a better
concept.But let me try.</span><br /><p class=""><font face="arial, helvetica, sans-serif"><span class=""></span><br
/></font><pclass=""><span class=""><font face="arial, helvetica, sans-serif">Reasons:</font></span><p class=""><span
style="font-family:arial,helvetica,sans-serif">Downtimeis bad. PostgreSQL failover requires manual intervention (client
configurationor host or DNS editing). Third party tools (in my experience) don’t offer the same stability and quality
asPostgreSQL is. Also, this concept wouldn’t work without pg_rewind.</span><br /><p class=""><span
style="font-family:arial,helvetica,sans-serif">Lesssoftware means less bugs.</span><br /><p class=""><font face="arial,
helvetica,sans-serif"><span class=""></span><br /></font><p class=""><span class=""><font face="arial, helvetica,
sans-serif">Goals:</font></span><pclass=""><span style="font-family:arial,helvetica,sans-serif">Providing near to 100%
HAwith minimal manual intervention. Minimizing possible human errors during failover. Making startup founders sleep
wellin the night. Automatic client configuration. Avoiding split brains.</span><br /><p class=""><font face="arial,
helvetica,sans-serif"><span class=""></span><br /></font><p class=""><span class=""><font face="arial, helvetica,
sans-serif">Extras:</font></span><pclass=""><span style="font-family:arial,helvetica,sans-serif">Automatic streaming
chainconfiguration.</span><br /><p class=""><font face="arial, helvetica, sans-serif"><span class=""></span><br
/></font><pclass=""><span class=""><font face="arial, helvetica, sans-serif">No-goals:</font></span><p class=""><span
style="font-family:arial,helvetica,sans-serif">Multi-masterreplication. Sharding. Proxying. Load balancing.</span><br
/><pclass=""><font face="arial, helvetica, sans-serif"><span class=""></span><br /></font><p class=""><span
class=""><fontface="arial, helvetica, sans-serif">Why these:</font></span><p class=""><span
style="font-family:arial,helvetica,sans-serif">It’sbetter to have a working technology now than a futuristic solution
inthe future. For many applications, stability and HA are more important than sharding or multi-master.</span><br /><p
class=""><fontface="arial, helvetica, sans-serif"><span class=""></span><br /></font><p class=""><span class=""><font
face="arial,helvetica, sans-serif">The concept:</font></span><p class=""><span
style="font-family:arial,helvetica,sans-serif">Youcan set up a single-master PostgreSQL cluster with two or more nodes
thatcan failover several times without manual re-configuration. Restarting the client isn’t needed if it’s smart enough
toreconnect. Third party software isn’t needed. Proxying isn’t needed.</span><br /><p class=""><font face="arial,
helvetica,sans-serif"><span class=""></span><br /></font><p class=""><span class=""><font face="arial, helvetica,
sans-serif">Cases:</font></span><pclass=""><font face="arial, helvetica, sans-serif"><span class=""></span><br
/></font><pclass=""><span class=""><font face="arial, helvetica, sans-serif">Running the cluster:</font></span><p
class=""><spanstyle="font-family:arial,helvetica,sans-serif">The cluster is running. There is one master. Every other
nodesare hot-standby slaves.</span><br /><p class=""><span style="font-family:arial,helvetica,sans-serif">The
client-driveraccepts several hostname(:port) values in the connection parameters. They must belong to the same cluster.
(Thecluster’s name might be provided too).</span><br /><p class=""><span
style="font-family:arial,helvetica,sans-serif">Therest of the options (username, database name) are the same and needed
onlyonce. It’s not necessary to list every hosts. (Even listing one host is enough but not recommended).</span><br /><p
class=""><spanstyle="font-family:arial,helvetica,sans-serif">The client connects to one of the given hosts. If the node
isrunning and it’s a slave, it tells the client which host the master is. The client connects to the master, even if
themaster was not listed in the connection parameters.</span><br /><p class=""><span
style="font-family:arial,helvetica,sans-serif">It’sshould be possible that the client stays connected to the slave for
read-onlyqueries if the application wants to do that.</span><br /><p class=""><span
style="font-family:arial,helvetica,sans-serif">Ifthe node the client tried connect to isn’t working, the client tries
anothernode and so.</span><br /><p class=""><font face="arial, helvetica, sans-serif"><span class=""></span><br
/></font><pclass=""><span class=""><font face="arial, helvetica, sans-serif">Manually promoting a new
master:</font></span><pclass=""><span style="font-family:arial,helvetica,sans-serif">The administrator promotes any of
theslaves. The slave tells the master to gracefully stop. The master stops executing queries. It waits until the slave
(thenew master) receives all the replication log. The new master is promoted. The old master becomes a slave. (It might
usepg_rewind).</span><br /><p class=""><span style="font-family:arial,helvetica,sans-serif">The old master asks the
connectedclients to reconnect to the new master. Then it drops the existing connections. It accepts new connections
thoughand tells them who the master is.</span><br /><p class=""><font face="arial, helvetica, sans-serif"><span
class=""></span><br/></font><p class=""><span class=""><font face="arial, helvetica, sans-serif">Manual step-down of
themaster:</font></span><p class=""><span style="font-family:arial,helvetica,sans-serif">The administrator kindly asks
themaster to stop being the master. The cluster elects a new master. Then it’s the same as promoting a new
master.</span><br/><p class=""><span style="font-family:arial,helvetica,sans-serif"><br /></span><p class=""><span
style="font-family:arial,helvetica,sans-serif">Manualshutdown of the master:</span><br /><p class=""><span
style="font-family:arial,helvetica,sans-serif">It’ssame as step-down but the master won’t run as a slave until it’s
startedup again.</span><br /><p class=""><font face="arial, helvetica, sans-serif"><span class=""></span><br
/></font><pclass=""><span class=""><font face="arial, helvetica, sans-serif">Automatic failover:</font></span><p
class=""><spanstyle="font-family:arial,helvetica,sans-serif">The master stops responding for a given period. The
majorityof the cluster elects a new master. Then the process is the same as manual promotion.</span><br /><p
class=""><spanstyle="font-family:arial,helvetica,sans-serif">When the old master starts up, the cluster tells it that
itis not a master anymore. It does pg_rewind and acts as a slave.</span><br /><p class=""><span
style="font-family:arial,helvetica,sans-serif">Automaticfailover can happen again without human intervention. The
clientsare reconnected to the new master each time.</span><br /><p class=""><font face="arial, helvetica,
sans-serif"><spanclass=""></span><br /></font><p class=""><span class=""><font face="arial, helvetica,
sans-serif">Automaticfailover without majority:</font></span><p class=""><span
style="font-family:arial,helvetica,sans-serif">It’spossible to tell in the config which server may act as a master when
thereis no majority to vote.</span><span style="font-family:arial,helvetica,sans-serif"> </span><br /><p class=""><font
face="arial,helvetica, sans-serif"><span class=""></span><br /></font><p class=""><span class=""><font face="arial,
helvetica,sans-serif">Replication chain:</font></span><p class=""><span
style="font-family:arial,helvetica,sans-serif">Thereare two cases. 1: All the slaves connect to the master. 2: One
slaveconnects to the master and the rest of the nodes replicate from this slave.</span><span
style="font-family:arial,helvetica,sans-serif"> </span><br/><p class=""><font face="arial, helvetica, sans-serif"><span
class=""></span><br/></font><p class=""><span class=""><font face="arial, helvetica,
sans-serif">Configuration:</font></span><pclass=""><span style="font-family:arial,helvetica,sans-serif">Every node
shouldhave a “recovery.conf” that is not renamed on promotion.</span><br /><p class=""><span
style="font-family:arial,helvetica,sans-serif">cluster_name:an identifier for the cluster. Why not.</span><br /><p
class=""><spanstyle="font-family:arial,helvetica,sans-serif">hosts: list of the hosts. It is recommended but not needed
toinclude every hosts in every file. It could work as the driver, discovering the rest of the cluster.</span><br /><p
class=""><spanstyle="font-family:arial,helvetica,sans-serif">master_priority: integer. How likely this node becomes the
newmaster on failover (except manual promotion). A working cluster should not elect a new master just because it has
higherpriority than the current one. Election happens only for the described reasons above.</span><br /><p
class=""><spanstyle="font-family:arial,helvetica,sans-serif">slave_priority: integer. If any running node has this
valuelarger than 0, the replication node is also elected, and the rest of the slaves replicate from the elected slave.
Otherwise,they replicate from the master.</span><br /><p class=""><span
style="font-family:arial,helvetica,sans-serif">primary_master:boolean. The node may run as master without elected by
themajority. (This is not needed on manual promotion or shutdown. See bookkeeping.)</span><br /><p class=""><span
style="font-family:arial,helvetica,sans-serif">safe:boolean. If this is set true and any kind of graceful failover
happens,the promotion has to wait until this node also receives the whole replication stream even if it’s not the new
master.Unless it’s not running. Every node can have this true for maximum safety.</span><br /><p class=""><font
face="arial,helvetica, sans-serif"><span class=""></span><br /></font><p class=""><span class=""><font face="arial,
helvetica,sans-serif">Bookkeeping:</font></span><p class=""><span style="font-family:arial,helvetica,sans-serif">It
wouldbe good to know whether a node crashed or was shut down properly. This would make a difference in master election,
streaming_slaveelection and the “safe” option. A two nodes cluster would highly depend on the bookkeeping.</span><br
/><pclass=""><span style="font-family:arial,helvetica,sans-serif">Bookkeeping would also help when a
crashed/disconnectedmaster that has primary_master=true comes back but doesn’t see the rest of the cluster.</span><br
/><pclass=""><font face="arial, helvetica, sans-serif"><span class=""></span><br /></font><p class=""><span
class=""><fontface="arial, helvetica, sans-serif">Questions:</font></span><p class=""><span
style="font-family:arial,helvetica,sans-serif">Isthere any chance that something like this gets implemented?</span><br
/><pclass=""><span style="font-family:arial,helvetica,sans-serif"><br /></span><p class=""><span
style="font-family:arial,helvetica,sans-serif">Thankyou for reading.</span><p class=""><span
style="font-family:arial,helvetica,sans-serif"><br/></span><p class=""><span
style="font-family:arial,helvetica,sans-serif">M.</span></div>

pgsql-hackers by date:

Previous
From: Vitaly Burovoy
Date:
Subject: Feature or bug: getting "Inf"::timestamp[tz] by "regular" value
Next
From: Konstantin Knizhnik
Date:
Subject: SPI and transactions