[HACKERS] Determine state of cluster (HA) - Mailing list pgsql-hackers

From Joshua D. Drake
Subject [HACKERS] Determine state of cluster (HA)
Date
Msg-id eed30ddc-5969-f8b2-a767-b18fe48aad7e@commandprompt.com
Whole thread Raw
Responses Re: [HACKERS] Determine state of cluster (HA)
Re: [HACKERS] Determine state of cluster (HA)
List pgsql-hackers
-Hackers,

I had a long call with a firm developing front end proxy/cache/HA for 
Postgres today. Essentially the software is a replacement for PGPool in 
entirety but also supports analytics etc... When I was asking them about 
pain points they talked about the below and I was wondering if this is a 
problem we would like to solve:
 Per your request, here is our failover issue.

1.  In a modern devops environment, the database should be able to scale 
and morph over time based on need.
2.  Tools that are leveraging the database should be able to easily 
discover and potentially control (with permissions) the database. 
Currently, you can discover the master and what nodes are syncing off of 
it, but on a failure, a tool can't easily discover what orchestration 
has done on the back-end to make the cluster whole again, i.e. from the 
slave, you can't discover the master reliably and easily.

The logic that our code now uses is to:

1.  Find the master
2.  Add replication nodes per the master's configuration.

To find a master, we start with a list of candidate nodes that MAY be a 
master at any point, and:
1. issue "SELECT pg_is_in_recovery()" to find if it is a slave
a. If so, use "SELECT pg_read_file('recovery.conf')" to extract the host
b. Attempt to connect to the host directly, if not...
c. use the slave and use the hostname via dblink to connect to the 
master, as the hostname , i.e. select * from dblink('" + connInfo + " 
dbname=postgres', 'select inet_server_addr()') AS t(inet_server_addr 
inet).  This is necessary in the event the hostname used in the 
recovery.conf file is not resolvable from the outside.
d. Use the dblink connection to ID the master node via select 
inet_server_addr();
e. connect to the IP provided by the master.
f.  Repeat through nodes until we get a master.

Issues:
1.  The dblink call doesn't have a way to specify a timeout, so we have 
to use Java futures to control how long this may take to a reasonable 
amount of time;
2.  NAT mapping may result in us detecting IP ranges that are not 
accessible to the application nodes.
3.  there is no easy way to monitor for state changes as they happen, 
allowing faster failovers, everything has to be polled based on events;
4.  It doesn't support cascading replication very well, although we 
could augment the logic to allow us to map the relationship between nodes.
5.  There is no way to connect to a db node with something akin to 
SQL-Server's "application intent" flags, to allow a connection to be 
rejected if we wish it to be a read/write connection.  This helps detect 
the state of the node directly without having to ask any further 
questions of the node, and makes it easier to "stall" during connection 
until a proper connection can be made.
6.  The master, on shutdown, will not actually close and disable 
connections as it shuts down, instead, it will issue an error that it is 
shutting down as it does so.

Fundamentally, the biggest issue is that it is very hard to determine 
the state of the cluster by asking all the nodes, in particular in the 
case of a failure.  Some state information is lost that is necessary to 
talk to the cluster moving forward in a reliable manner.


-- 
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL Centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://pgconf.us
*****     Unless otherwise stated, opinions are my own.   *****


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: [HACKERS] UPDATE of partition key
Next
From: Robert Haas
Date:
Subject: Re: [HACKERS] Pluggable storage