PostgreSQL super HA (High Availability) conception for 9.5+ - Mailing list pgsql-hackers
From | Maeldron T. |
---|---|
Subject | PostgreSQL super HA (High Availability) conception for 9.5+ |
Date | |
Msg-id | CAKatfSndnj9THRo6iaqXU2H1Ej3n_RzQ6G-o1OYexUEhUkm5HQ@mail.gmail.com Whole thread Raw |
List | pgsql-hackers |
<div dir="ltr"><p class=""><span class=""><font face="arial, helvetica, sans-serif">Hello,</font></span><p class=""><fontface="arial, helvetica, sans-serif">Foreword:<br /></font><p class=""><font face="arial, helvetica, sans-serif">Unfortunately,I have no time to read the mailing lists and attend events like PostgreSQL and NoSQL. Some of theideas came from MongoDB and Cassandra. The inspiration was the pg_rewind.<br /><span class=""></span></font><p class=""><spanstyle="font-family:arial,helvetica,sans-serif">There is little new here, it’s a wish-list put together, consideringwhat could be possible in the foreseeable future. It’s likely that people worked on a similar or a better concept.But let me try.</span><br /><p class=""><font face="arial, helvetica, sans-serif"><span class=""></span><br /></font><pclass=""><span class=""><font face="arial, helvetica, sans-serif">Reasons:</font></span><p class=""><span style="font-family:arial,helvetica,sans-serif">Downtimeis bad. PostgreSQL failover requires manual intervention (client configurationor host or DNS editing). Third party tools (in my experience) don’t offer the same stability and quality asPostgreSQL is. Also, this concept wouldn’t work without pg_rewind.</span><br /><p class=""><span style="font-family:arial,helvetica,sans-serif">Lesssoftware means less bugs.</span><br /><p class=""><font face="arial, helvetica,sans-serif"><span class=""></span><br /></font><p class=""><span class=""><font face="arial, helvetica, sans-serif">Goals:</font></span><pclass=""><span style="font-family:arial,helvetica,sans-serif">Providing near to 100% HAwith minimal manual intervention. Minimizing possible human errors during failover. Making startup founders sleep wellin the night. Automatic client configuration. Avoiding split brains.</span><br /><p class=""><font face="arial, helvetica,sans-serif"><span class=""></span><br /></font><p class=""><span class=""><font face="arial, helvetica, sans-serif">Extras:</font></span><pclass=""><span style="font-family:arial,helvetica,sans-serif">Automatic streaming chainconfiguration.</span><br /><p class=""><font face="arial, helvetica, sans-serif"><span class=""></span><br /></font><pclass=""><span class=""><font face="arial, helvetica, sans-serif">No-goals:</font></span><p class=""><span style="font-family:arial,helvetica,sans-serif">Multi-masterreplication. Sharding. Proxying. Load balancing.</span><br /><pclass=""><font face="arial, helvetica, sans-serif"><span class=""></span><br /></font><p class=""><span class=""><fontface="arial, helvetica, sans-serif">Why these:</font></span><p class=""><span style="font-family:arial,helvetica,sans-serif">It’sbetter to have a working technology now than a futuristic solution inthe future. For many applications, stability and HA are more important than sharding or multi-master.</span><br /><p class=""><fontface="arial, helvetica, sans-serif"><span class=""></span><br /></font><p class=""><span class=""><font face="arial,helvetica, sans-serif">The concept:</font></span><p class=""><span style="font-family:arial,helvetica,sans-serif">Youcan set up a single-master PostgreSQL cluster with two or more nodes thatcan failover several times without manual re-configuration. Restarting the client isn’t needed if it’s smart enough toreconnect. Third party software isn’t needed. Proxying isn’t needed.</span><br /><p class=""><font face="arial, helvetica,sans-serif"><span class=""></span><br /></font><p class=""><span class=""><font face="arial, helvetica, sans-serif">Cases:</font></span><pclass=""><font face="arial, helvetica, sans-serif"><span class=""></span><br /></font><pclass=""><span class=""><font face="arial, helvetica, sans-serif">Running the cluster:</font></span><p class=""><spanstyle="font-family:arial,helvetica,sans-serif">The cluster is running. There is one master. Every other nodesare hot-standby slaves.</span><br /><p class=""><span style="font-family:arial,helvetica,sans-serif">The client-driveraccepts several hostname(:port) values in the connection parameters. They must belong to the same cluster. (Thecluster’s name might be provided too).</span><br /><p class=""><span style="font-family:arial,helvetica,sans-serif">Therest of the options (username, database name) are the same and needed onlyonce. It’s not necessary to list every hosts. (Even listing one host is enough but not recommended).</span><br /><p class=""><spanstyle="font-family:arial,helvetica,sans-serif">The client connects to one of the given hosts. If the node isrunning and it’s a slave, it tells the client which host the master is. The client connects to the master, even if themaster was not listed in the connection parameters.</span><br /><p class=""><span style="font-family:arial,helvetica,sans-serif">It’sshould be possible that the client stays connected to the slave for read-onlyqueries if the application wants to do that.</span><br /><p class=""><span style="font-family:arial,helvetica,sans-serif">Ifthe node the client tried connect to isn’t working, the client tries anothernode and so.</span><br /><p class=""><font face="arial, helvetica, sans-serif"><span class=""></span><br /></font><pclass=""><span class=""><font face="arial, helvetica, sans-serif">Manually promoting a new master:</font></span><pclass=""><span style="font-family:arial,helvetica,sans-serif">The administrator promotes any of theslaves. The slave tells the master to gracefully stop. The master stops executing queries. It waits until the slave (thenew master) receives all the replication log. The new master is promoted. The old master becomes a slave. (It might usepg_rewind).</span><br /><p class=""><span style="font-family:arial,helvetica,sans-serif">The old master asks the connectedclients to reconnect to the new master. Then it drops the existing connections. It accepts new connections thoughand tells them who the master is.</span><br /><p class=""><font face="arial, helvetica, sans-serif"><span class=""></span><br/></font><p class=""><span class=""><font face="arial, helvetica, sans-serif">Manual step-down of themaster:</font></span><p class=""><span style="font-family:arial,helvetica,sans-serif">The administrator kindly asks themaster to stop being the master. The cluster elects a new master. Then it’s the same as promoting a new master.</span><br/><p class=""><span style="font-family:arial,helvetica,sans-serif"><br /></span><p class=""><span style="font-family:arial,helvetica,sans-serif">Manualshutdown of the master:</span><br /><p class=""><span style="font-family:arial,helvetica,sans-serif">It’ssame as step-down but the master won’t run as a slave until it’s startedup again.</span><br /><p class=""><font face="arial, helvetica, sans-serif"><span class=""></span><br /></font><pclass=""><span class=""><font face="arial, helvetica, sans-serif">Automatic failover:</font></span><p class=""><spanstyle="font-family:arial,helvetica,sans-serif">The master stops responding for a given period. The majorityof the cluster elects a new master. Then the process is the same as manual promotion.</span><br /><p class=""><spanstyle="font-family:arial,helvetica,sans-serif">When the old master starts up, the cluster tells it that itis not a master anymore. It does pg_rewind and acts as a slave.</span><br /><p class=""><span style="font-family:arial,helvetica,sans-serif">Automaticfailover can happen again without human intervention. The clientsare reconnected to the new master each time.</span><br /><p class=""><font face="arial, helvetica, sans-serif"><spanclass=""></span><br /></font><p class=""><span class=""><font face="arial, helvetica, sans-serif">Automaticfailover without majority:</font></span><p class=""><span style="font-family:arial,helvetica,sans-serif">It’spossible to tell in the config which server may act as a master when thereis no majority to vote.</span><span style="font-family:arial,helvetica,sans-serif"> </span><br /><p class=""><font face="arial,helvetica, sans-serif"><span class=""></span><br /></font><p class=""><span class=""><font face="arial, helvetica,sans-serif">Replication chain:</font></span><p class=""><span style="font-family:arial,helvetica,sans-serif">Thereare two cases. 1: All the slaves connect to the master. 2: One slaveconnects to the master and the rest of the nodes replicate from this slave.</span><span style="font-family:arial,helvetica,sans-serif"> </span><br/><p class=""><font face="arial, helvetica, sans-serif"><span class=""></span><br/></font><p class=""><span class=""><font face="arial, helvetica, sans-serif">Configuration:</font></span><pclass=""><span style="font-family:arial,helvetica,sans-serif">Every node shouldhave a “recovery.conf” that is not renamed on promotion.</span><br /><p class=""><span style="font-family:arial,helvetica,sans-serif">cluster_name:an identifier for the cluster. Why not.</span><br /><p class=""><spanstyle="font-family:arial,helvetica,sans-serif">hosts: list of the hosts. It is recommended but not needed toinclude every hosts in every file. It could work as the driver, discovering the rest of the cluster.</span><br /><p class=""><spanstyle="font-family:arial,helvetica,sans-serif">master_priority: integer. How likely this node becomes the newmaster on failover (except manual promotion). A working cluster should not elect a new master just because it has higherpriority than the current one. Election happens only for the described reasons above.</span><br /><p class=""><spanstyle="font-family:arial,helvetica,sans-serif">slave_priority: integer. If any running node has this valuelarger than 0, the replication node is also elected, and the rest of the slaves replicate from the elected slave. Otherwise,they replicate from the master.</span><br /><p class=""><span style="font-family:arial,helvetica,sans-serif">primary_master:boolean. The node may run as master without elected by themajority. (This is not needed on manual promotion or shutdown. See bookkeeping.)</span><br /><p class=""><span style="font-family:arial,helvetica,sans-serif">safe:boolean. If this is set true and any kind of graceful failover happens,the promotion has to wait until this node also receives the whole replication stream even if it’s not the new master.Unless it’s not running. Every node can have this true for maximum safety.</span><br /><p class=""><font face="arial,helvetica, sans-serif"><span class=""></span><br /></font><p class=""><span class=""><font face="arial, helvetica,sans-serif">Bookkeeping:</font></span><p class=""><span style="font-family:arial,helvetica,sans-serif">It wouldbe good to know whether a node crashed or was shut down properly. This would make a difference in master election, streaming_slaveelection and the “safe” option. A two nodes cluster would highly depend on the bookkeeping.</span><br /><pclass=""><span style="font-family:arial,helvetica,sans-serif">Bookkeeping would also help when a crashed/disconnectedmaster that has primary_master=true comes back but doesn’t see the rest of the cluster.</span><br /><pclass=""><font face="arial, helvetica, sans-serif"><span class=""></span><br /></font><p class=""><span class=""><fontface="arial, helvetica, sans-serif">Questions:</font></span><p class=""><span style="font-family:arial,helvetica,sans-serif">Isthere any chance that something like this gets implemented?</span><br /><pclass=""><span style="font-family:arial,helvetica,sans-serif"><br /></span><p class=""><span style="font-family:arial,helvetica,sans-serif">Thankyou for reading.</span><p class=""><span style="font-family:arial,helvetica,sans-serif"><br/></span><p class=""><span style="font-family:arial,helvetica,sans-serif">M.</span></div>
pgsql-hackers by date: