Thread: [GENERAL] Repmgr + pgbouncer - Notification of master promotion to applicationlevel ...

Hello.

Currently we're running all our PostgreSQL databases on ESX Stretch
Clusters, but our company is looking at moving away from those to a
private cloud solution.

I'm informed that, going forward, new servers provisioned for us to
install PostgreSQL databases on will be entirely isolated, using no
network/clustered storage of any kind but only local storage - so
stretch clusters, VCS etc are out.

To this end we're looking at using standard PostgreSQL
replication/multiple nodes to provide continued availability in the
event of a primary server failure, and I've been directed to set-up some
three node testbeds.

It was originally suggested we use Pacemaker/Corosync for this, but then
our Linux sysadm team advised that Pacemaker/Corosync hadn't been
certified for our particular cloud solution, and a VIP couldn't be done
as the database servers may exist on different networks/availability zones.

(BTW, I apologize if somebody reading this is going 'What nonsense! Of
course it will work! Here's how ...' - I'm no sysadm, I'm just going on
what I was told)

As a backup/contingency plan I'm now using repmgr to handle things - I'm
very happy with it (after getting it compiled :) ) it seems nice and
simple, and repmgrd is working nicely. A number of test failovers have
worked perfectly.

BUT ...

repmgr doesn't provide a mechanism to notify upstream connections that a
failover has occurred and that the master is now on server y instead of
server x.

I'm currently looking at pgbouncer, as that's proposed as a STONITH
mechanism in the repmgr git documentation.

The new master's repmgr promote script will execute commands to pause
pgbouncer, reconfigure pgbouncer to point to the new database address,
and then resume.

This seems like a straightforward and viable option, except that if we
only have one pgbouncer we've just introduced a new single point of
failure, and if we have multiple pgbouncers how best to handle that? A
pgbouncer on every application server, two or three pgbouncers in the
middle with some kind of load balancer, a pgbouncer on every db server?
Each option appears to have advantages and drawbacks.

So I was just wondering if the community had any
recommendations/suggestions/gotchas for this? Is pgbouncer a decent
choice, or are there better solutions out there? Has anyone tried
pgbouncer and found it worked well, or tried it and found it was a
disaster and moved onto something else?

Basically, I'm wondering if I'm on the right path or whether there's
something out there that everyone is now screaming 'Why on earth is this
idiot not using X instead?' :)

Any input would be greatly appreciated.

Regards,

Martin.


--
Martin Goodson

"Have you thought up some clever plan, Doctor?"
"Yes, Jamie, I believe I have."
"What're you going to do?"
"Bung a rock at it."



On 14/06/17, Martin Goodson (kaemaril@googlemail.com) wrote:
> The new master's repmgr promote script will execute commands to pause
> pgbouncer, reconfigure pgbouncer to point to the new database address, and
> then resume.

You could just move the service ip address at the new postgresql master
to which the pgbouncer instances on each app server could reconnect.

I assume, with reference to https://github.com/2ndQuadrant/repmgr, that
you could use repmgrd to trigger a script to do just that at
"switchover", possibly something you would need to monitor the
"repl_events" table to achieve.

Rory


On 14/06/2017 19:54, Rory Campbell-Lange wrote:
> On 14/06/17, Martin Goodson (kaemaril@googlemail.com) wrote:
>> The new master's repmgr promote script will execute commands to pause
>> pgbouncer, reconfigure pgbouncer to point to the new database address, and
>> then resume.
>
> You could just move the service ip address at the new postgresql master
> to which the pgbouncer instances on each app server could reconnect.
>
> I assume, with reference to https://github.com/2ndQuadrant/repmgr, that
> you could use repmgrd to trigger a script to do just that at
> "switchover", possibly something you would need to monitor the
> "repl_events" table to achieve.
>
> Rory
>

Well, yes. That's pretty much the plan. See the repmgr documents:

https://github.com/2ndQuadrant/repmgr/blob/master/docs/repmgrd-node-fencing.md
-

In a failover situation, repmgrd promotes a standby to master by
executing the command defined in promote_command. Normally this would be
something like:

repmgr standby promote -f /etc/repmgr.conf

By wrapping this in a custom script which adjusts the pgbouncer
configuration on all nodes, it's possible to fence the failed master and
redirect write connections to the new master.


I'm just wondering how people may have implemented this. Do people setup
pgbouncer nodes on the database servers themselves, on application
servers, in the middle tier between the application and database, and so
forth, or some combination of the three? I can think of some advantages
or drawbacks for each. Or do people find that repmgr works better with
other tools to handle the promotion notification outside the database
cluster?

Basically I'm new to this, and I'm wondering how folks have handled this
issue. I'm basically looking for the community's wisdom :)

--
Martin Goodson

"Have you thought up some clever plan, Doctor?"
"Yes, Jamie, I believe I have."
"What're you going to do?"
"Bung a rock at it."



Am 15.06.2017 um 01:18 schrieb Martin Goodson:
>
>
> I'm just wondering how people may have implemented this. Do people
> setup pgbouncer nodes on the database servers themselves, on
> application servers, in the middle tier between the application and
> database, and so forth, or some combination of the three? I can think
> of some advantages or drawbacks for each. Or do people find that
> repmgr works better with other tools to handle the promotion
> notification outside the database cluster?
>
> Basically I'm new to this, and I'm wondering how folks have handled
> this issue. I'm basically looking for the community's wisdom :)
>

Usually we recommend to install pgbouncer on the app-servers.

If you have full control of the application you can try to integrate the
logic into the application (provide a list of servers, the new
pg10-version of libpg is working similar in this way:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=274bb2b3857cc987cfa21d14775cae9b0dababa5
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=721f7bd3cbccaf8c07cad2707826b83f84694832
)

Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com



On 15/06/17, Martin Goodson (kaemaril@googlemail.com) wrote:
> On 14/06/2017 19:54, Rory Campbell-Lange wrote:
> >On 14/06/17, Martin Goodson (kaemaril@googlemail.com) wrote:
> >>The new master's repmgr promote script will execute commands to pause
> >>pgbouncer, reconfigure pgbouncer to point to the new database address, and
> >>then resume.
> >
> >You could just move the service ip address at the new postgresql master
> >to which the pgbouncer instances on each app server could reconnect.
> >
> >I assume, with reference to https://github.com/2ndQuadrant/repmgr, that
> >you could use repmgrd to trigger a script to do just that at
> >"switchover", possibly something you would need to monitor the
> >"repl_events" table to achieve.

> I'm just wondering how people may have implemented this. Do people setup
> pgbouncer nodes on the database servers themselves, on application servers,

We have pgbouncer on the application servers and shift the postgres
master ip address to the promoted master.



On 15/06/17, Andreas Kretschmer (andreas@a-kretschmer.de) wrote:
>
> Am 15.06.2017 um 01:18 schrieb Martin Goodson:
> >
> >...Do people setup pgbouncer nodes on the database servers
> >themselves, on application servers, in the middle tier between the
> >application and database, and so forth, or some combination of the
> >three?
>
> Usually we recommend to install pgbouncer on the app-servers.
>
> If you have full control of the application you can try to integrate the
> logic into the application (provide a list of servers, the new pg10-version
> of libpg is working similar in this way:
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=274bb2b3857cc987cfa21d14775cae9b0dababa5
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=721f7bd3cbccaf8c07cad2707826b83f84694832
> )

Hi Andreas

The list of servers idea is a cool enhancement. However would pgbouncer
(or another client) be able to detect which of those servers were in slave
mode?

Otherwise, if there is a temporary glitch in communications with the
master, a client (such as pgbouncer) could move to try inserts on a
slave.

Rory



Am 15.06.2017 um 08:26 schrieb Rory Campbell-Lange:
> On 15/06/17, Andreas Kretschmer (andreas@a-kretschmer.de) wrote:
>> Am 15.06.2017 um 01:18 schrieb Martin Goodson:
>>> ...Do people setup pgbouncer nodes on the database servers
>>> themselves, on application servers, in the middle tier between the
>>> application and database, and so forth, or some combination of the
>>> three?
>> Usually we recommend to install pgbouncer on the app-servers.
>>
>> If you have full control of the application you can try to integrate the
>> logic into the application (provide a list of servers, the new pg10-version
>> of libpg is working similar in this way:
>> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=274bb2b3857cc987cfa21d14775cae9b0dababa5
>> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=721f7bd3cbccaf8c07cad2707826b83f84694832
>> )
> Hi Andreas
>
> The list of servers idea is a cool enhancement. However would pgbouncer
> (or another client) be able to detect which of those servers were in slave
> mode?

it is possible to detect which is the master. So, if you know the master
and you have a list of all, you knows also the standby's ;-)

>
> Otherwise, if there is a temporary glitch in communications with the
> master, a client (such as pgbouncer) could move to try inserts on a
> slave.

Right. You can play with keepalives* - settings to avoid problems.


Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com



On 15/06/2017 05:27, Andreas Kretschmer wrote:
>
>
> Am 15.06.2017 um 01:18 schrieb Martin Goodson:
>>
>>
>> I'm just wondering how people may have implemented this. Do people
>> setup pgbouncer nodes on the database servers themselves, on
>> application servers, in the middle tier between the application and
>> database, and so forth, or some combination of the three? I can think
>> of some advantages or drawbacks for each. Or do people find that
>> repmgr works better with other tools to handle the promotion
>> notification outside the database cluster?
>>
>> Basically I'm new to this, and I'm wondering how folks have handled
>> this issue. I'm basically looking for the community's wisdom :)
>>
>
> Usually we recommend to install pgbouncer on the app-servers.
>
> If you have full control of the application you can try to integrate
> the logic into the application (provide a list of servers, the new
> pg10-version of libpg is working similar in this way:
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=274bb2b3857cc987cfa21d14775cae9b0dababa5
>
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=721f7bd3cbccaf8c07cad2707826b83f84694832
>
> )
>
> Regards, Andreas
>

Hello.

Unfortunately, no control of the application layer.

Very interesting feedback so far. Thanks, everyone!

The issues I think I would have with pgbouncer at the application level
is ...

1) What if an application server is down when pgbouncer tries to update
where the database IP is pointing to? When it is brought back into
service could that create a split-brain scenario if it's still pointing
to the old master? Since the only STONITH here is 'I've told you to talk
to server X instead of server Y' what happens if somebody doesn't get
that message and the old master is still up and about (e.g. the failover
was caused by a transient error such as a power issue or network issue
that the 'old master' was able to quickly recover from and come back
online)? :)
2) We use a non-trivial number of application servers. Could that create
'failover lag' due to the amount of time it takes to notify all the
pgbouncers to stop, change IP, and restart?
3) Since we use a non-trivial number of application servers, the
administrative overhead of pushing all user password changes up to each
pgbouncer could be a bit of a pain (candidate for an automation tool
like ansible, perhaps?)

How do people deal with that?

Regards,

Martin.

--
Martin Goodson

"Have you thought up some clever plan, Doctor?"
"Yes, Jamie, I believe I have."
"What're you going to do?"
"Bung a rock at it."



Am 15.06.2017 um 11:57 schrieb Martin Goodson:
>
>
> The issues I think I would have with pgbouncer at the application
> level is ...
>
> 1) What if an application server is down when pgbouncer tries to
> update where the database IP is pointing to? When it is brought back
> into service could that create a split-brain scenario if it's still
> pointing to the old master? Since the only STONITH here is 'I've told
> you to talk to server X instead of server Y' what happens if somebody
> doesn't get that message and the old master is still up and about
> (e.g. the failover was caused by a transient error such as a power
> issue or network issue that the 'old master' was able to quickly
> recover from and come back online)? :)

yeah, that's problematic here.

> 2) We use a non-trivial number of application servers. Could that
> create 'failover lag' due to the amount of time it takes to notify all
> the pgbouncers to stop, change IP, and restart?

possible, yes.

> 3) Since we use a non-trivial number of application servers, the
> administrative overhead of pushing all user password changes up to
> each pgbouncer could be a bit of a pain (candidate for an automation
> tool like ansible, perhaps?)

use auth_query instead of auth_file.


Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com



Hi Andreas,

in my pgbouncer configured as:
;auth_file=/test/user.txt
auth_query = SELECT usename, passwd FROM pg_shadow WHERE usename=$1

Of course. When try connect with command " psql -p 6666  postgres -U dba"

Print erro in screen.
psql: ERROR:  No such user: dba

But, exists user and passwd in database. And try connect with "psql -p 5432
postgres -U dba"
success!



--
View this message in context:
http://www.postgresql-archive.org/Repmgr-pgbouncer-Notification-of-master-promotion-to-application-level-tp5966736p5979699.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.