Re: [HACKERS] Determine state of cluster (HA) - Mailing list pgsql-hackers
From | Jehan-Guillaume de Rorthais |
---|---|
Subject | Re: [HACKERS] Determine state of cluster (HA) |
Date | |
Msg-id | 20171016141025.6bc022b8@firost Whole thread Raw |
In response to | Re: [HACKERS] Determine state of cluster (HA) (Craig Ringer <craig@2ndquadrant.com>) |
List | pgsql-hackers |
On Mon, 16 Oct 2017 10:39:16 +0800 Craig Ringer <craig@2ndquadrant.com> wrote: > On 13 October 2017 at 08:50, Joshua D. Drake <jd@commandprompt.com> wrote: > > I had a long call with a firm developing front end proxy/cache/HA for > > Postgres today. Essentially the software is a replacement for PGPool in > > entirety but also supports analytics etc... When I was asking them about > > pain points they talked about the below and I was wondering if this is a > > problem we would like to solve. > > IMO: no one node knows the full state of the system, or can know it. +1 > I'd love PostgreSQL to help users more with scaling, HA, etc. But I > think it's a big job. We'd need: > > - a node topology of some kind, communicated between nodes > - heartbeat and monitoring > - failover coordination > - pooling/proxying > - STONITH/fencing > - etc. And some of items on this list can not be in core. However, there's some things PostgreSQL can do to make HA easier to deal with. > That said, I do think it'd be very desirable for us to introduce a > greater link from a standby to master: > > - Get info about master. We should finish merging recovery.conf into > postgresql.conf. agree. +1. To make things easier from the "cluster manager" piece outside of PostgreSQL, I would add: * being able to "demote" a master as a standby without restart. * being able to check the status of each node without eating a backend connection (to avoid hitting "max_connection" limit) * being able to monitor each step of a switchover (or "controlled failover": standby/master role swapping between two nodes) > > b. Attempt to connect to the host directly, if not... > > c. use the slave and use the hostname via dblink to connect to the master, > > as the hostname , i.e. select * from dblink('" + connInfo + " > > dbname=postgres', 'select inet_server_addr()') AS t(inet_server_addr inet). > > This is necessary in the event the hostname used in the recovery.conf file > > is not resolvable from the outside. > > OK, so "connect directly" here means from some 3rd party, the one > doing the querying of the replica. It seems to me the failover process is issuing all required commands to move the master role to another available standby. The knowledge of the orchestration and final status (if everything went good) is in this piece of software. If you want to know where is your master in an exotic or complex setup, ask who was responsible to promote your master. HA should stay as simple as possible. The more the architecture is complex, the more you will have failing scenarios. > > 1. The dblink call doesn't have a way to specify a timeout, so we have to > > use Java futures to control how long this may take to a reasonable amount of > > time; > > statement_timeout doesn't work? > > If not, that sounds like a sensible, separate feature to add. Patches welcome! > > > 2. NAT mapping may result in us detecting IP ranges that are not accessible > > to the application nodes. > > PostgreSQL can't do anything about this one. You could get the master IP address from the "pg_stat_wal_receiver" view. But this is still not enough though. You might have dedicated networks for applications and for pgsql replication both separated. If you want a standby to tell the application where to connect to the master then you'll have to put this information yourself somewhere, accessible from application nodes. > > 3. there is no easy way to monitor for state changes as they happen, > > allowing faster failovers, everything has to be polled based on events; In the corosync world (the clustering piece of the Pacemaker ecosystem), node failure are detected really really fast. About 1s. Considering application failure (pgsql here), this will be polling, yes. But I fail to imagine how a dying application can warn the cluster before dying. Not only crashing (systemd could help there), but eg. before entering an infinite dummy loop or an exhausting one. > It'd be pretty simple to write a function that sleeps in the backend > until it's promoted. I don't know off the top of my head if we set all > proc latches when we promote, but if we don't it's probably harmless > and somewhat useful to do so. As soon as the cluster manager promoted a new master, it can trigger and event to notify whatever you need. > Either way, you'd do long-polling. Call the function and let the > connection block until something interesting happens. Use TCP > keepalives to make sure you notice if it dies. Have the function > return when the state changes. This would still rely on TCP keepalive frequency, back to polling :( Regards, -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
pgsql-hackers by date:
Previous
From: RADIX Alain - externeDate:
Subject: [HACKERS] ERROR: MultiXactId 3268957 has not been created yet -- apparentwraparound after missused pg_resetxlogs
Next
From: alain radixDate:
Subject: [HACKERS] ERROR: MultiXactId 3268957 has not been created yet -- apparentwraparound after missused pg_resetxlogs