Thread: Master-slave failover question

Master-slave failover question

From
"Athanasios | ZenGuard"
Date:
Dear list,
I hope that everyone has a nice holiday. Currently, I am researching master-slave automated promotion. I have one question. Let's assume that the clients connect to the master 192.168.1.1 for read/write traffic and I have a slave at 192.168.1.2. Assume that the master goes down and that slave gets promoted to a slave (this I have found out how to do automatically). How will the clients know to connect to the new master? Any ideas on this one?

Re: Master-slave failover question

From
Sameer Kumar
Date:

You can use pgpool to do this. Or you can use failover manager by enterpriseDB.

Regards
Sameer

PS: Sent from my Mobile device. Pls ignore typo n abb

Re: Master-slave failover question

From
"Athanasios | ZenGuard"
Date:
Thanks for the quick answer Sameer. However, using PGPool is not an option. Where can I find additional information for enterpriseDB?


On Fri, Jan 3, 2014 at 12:23 PM, Sameer Kumar <sameer.kumar@ashnik.com> wrote:

You can use pgpool to do this. Or you can use failover manager by enterpriseDB.

Regards
Sameer

PS: Sent from my Mobile device. Pls ignore typo n abb


Re: Master-slave failover question

From
Sameer Kumar
Date:

Check their site. Www.enterprisedb.com

Can you share you failover command here? You may use ifconfig to switch a vitrual IP while doing a failover.

BTW why are you reluctant to use pgpool?

Regards
Sameer

PS: Sent from my Mobile device. Pls ignore typo n abb

Re: Master-slave failover question

From
"Athanasios | ZenGuard"
Date:
Hi,
PGPool is a bit of a blackbox. I am confident in the use of VRRP but I have not seen any documentation on how to combine this with pgpool to eliminate single points of failure.
Can you share a few more details about how I can use a virtual IP?


On Fri, Jan 3, 2014 at 1:12 PM, Sameer Kumar <sameer.kumar@ashnik.com> wrote:

Check their site. Www.enterprisedb.com

Can you share you failover command here? You may use ifconfig to switch a vitrual IP while doing a failover.

BTW why are you reluctant to use pgpool?

Regards
Sameer

PS: Sent from my Mobile device. Pls ignore typo n abb


Re: Master-slave failover question

From
"Athanasios | ZenGuard"
Date:
Hi all again,
I rigged up a solution with PGPool (!) in master-slave mode, using the existing documentation and it works like a charm with a downtime of only a few seconds, which is acceptable in our case. However assume the following scenario:
1) Master goes down
2) slave gets promoted to master
owing to the usage of pgpool, this works fine. However, assuming that there is a certain amount of write traffic that goes to the new master. When the old master (sorry for the terminology) comes back up, how can I "sync" him with all the changes that have been performed? This is one of the two pieces of the puzzle missing for me (the other one is pgpool specific and I directed my query to their specific mailing list).
Thanks for any replies.


On Fri, Jan 3, 2014 at 2:00 PM, Athanasios | ZenGuard <athanasios@zenguard.org> wrote:
Hi,
PGPool is a bit of a blackbox. I am confident in the use of VRRP but I have not seen any documentation on how to combine this with pgpool to eliminate single points of failure.
Can you share a few more details about how I can use a virtual IP?


On Fri, Jan 3, 2014 at 1:12 PM, Sameer Kumar <sameer.kumar@ashnik.com> wrote:

Check their site. Www.enterprisedb.com

Can you share you failover command here? You may use ifconfig to switch a vitrual IP while doing a failover.

BTW why are you reluctant to use pgpool?

Regards
Sameer

PS: Sent from my Mobile device. Pls ignore typo n abb



Re: Master-slave failover question

From
Sameer Kumar
Date:

As of now (v9.3 of postgreSQL), you need to rebuild old (lost) master from new master.
Glad you could discover the magic of pgpool :-)

You can either use pgpoll admin to do this rebuilding in a single click or you can write your own shell script (like I had done) to resotre a lost node (do remember to test a scenario where slave is lost and you have to rebuild that).

Regards
Sameer

PS: Sent from my Mobile device. Pls ignore typo n abb

Re: Master-slave failover question

From
"Athanasios | ZenGuard"
Date:
Hi Sameer,
thank you for the quick reply (again!). I am not using pgpool admin so is it possible to share the shell script that does the rebuilding? Also, if a slave is lost, when he comes back up, will he not "catch up" using Streaming Replication?
Cheers!


On Fri, Jan 3, 2014 at 5:00 PM, Sameer Kumar <sameer.kumar@ashnik.com> wrote:

As of now (v9.3 of postgreSQL), you need to rebuild old (lost) master from new master.
Glad you could discover the magic of pgpool :-)

You can either use pgpoll admin to do this rebuilding in a single click or you can write your own shell script (like I had done) to resotre a lost node (do remember to test a scenario where slave is lost and you have to rebuild that).

Regards
Sameer

PS: Sent from my Mobile device. Pls ignore typo n abb


Re: Master-slave failover question

From
Sameer Kumar
Date:

If you are only relying on streaming replication (and not using archive shipping/hybrid replication), the probablity of 'catching-up' is as high as your wal_keep_segment.

Regards
Sameer

PS: Sent from my Mobile device. Pls ignore typo n abb

Re: Master-slave failover question

From
"Athanasios | ZenGuard"
Date:
Any sane values to it? This is what I am using right now (10 just to be on the safe side, the 5000 (!) is a repmgr requirement(!))

max_wal_senders = 10            # max number of walsender processes
wal_keep_segments = 5000        # in logfile segments, 16MB each; 0 disables





On Fri, Jan 3, 2014 at 5:07 PM, Sameer Kumar <sameer.kumar@ashnik.com> wrote:

If you are only relying on streaming replication (and not using archive shipping/hybrid replication), the probablity of 'catching-up' is as high as your wal_keep_segment.

Regards
Sameer

PS: Sent from my Mobile device. Pls ignore typo n abb


Re: Master-slave failover question

From
Sameer Kumar
Date:

If the other node node is not available for a few day then??? Esp if it is slave node?
I would keeping it (wal_keep_segment) fair and rebuilding the lost node if I ever lose one.

Let's see what others have to suggest.

Regards
Sameer

PS: Sent from my Mobile device. Pls ignore typo n abb

Re: Master-slave failover question

From
Sergey Konoplev
Date:
On Fri, Jan 3, 2014 at 3:18 AM, Athanasios | ZenGuard
<athanasios@zenguard.org> wrote:
> Dear list,
> I hope that everyone has a nice holiday. Currently, I am researching
> master-slave automated promotion. I have one question. Let's assume that the
> clients connect to the master 192.168.1.1 for read/write traffic and I have
> a slave at 192.168.1.2. Assume that the master goes down and that slave gets
> promoted to a slave (this I have found out how to do automatically). How
> will the clients know to connect to the new master? Any ideas on this one?

One of the ideas, and BTW my favorite method, is to use PgBouncer to
redirect queries from master to slave with pause. The full description
of the process is by the link below.

https://github.com/grayhemp/pgcookbook/blob/master/switching_to_another_server_with_pgbouncer.md

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray.ru@gmail.com


Re: Master-slave failover question

From
Sameer Kumar
Date:

On Tue, Jan 7, 2014 at 4:53 AM, Sergey Konoplev <gray.ru@gmail.com> wrote:
On Fri, Jan 3, 2014 at 3:18 AM, Athanasios | ZenGuard
<athanasios@zenguard.org> wrote:
> Dear list,
> I hope that everyone has a nice holiday. Currently, I am researching
> master-slave automated promotion. I have one question. Let's assume that the
> clients connect to the master 192.168.1.1 for read/write traffic and I have
> a slave at 192.168.1.2. Assume that the master goes down and that slave gets
> promoted to a slave (this I have found out how to do automatically). How
> will the clients know to connect to the new master? Any ideas on this one?

One of the ideas, and BTW my favorite method, is to use PgBouncer to
redirect queries from master to slave with pause. The full description
of the process is by the link below.

https://github.com/grayhemp/pgcookbook/blob/master/switching_to_another_server_with_pgbouncer.md


I checked it out. Looks like the process involves manual identification of failure(?). But this is a nice method for manual switchovers.

As a suggestion, won't it be a good idea to run the pgbouncer on Application Server [so that even if the Master Server has to be shutdown, IP address changes or starting pgbouncer on slave is not needed]?

 
--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray.ru@gmail.com


--
Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-novice

Re: Master-slave failover question

From
Sergey Konoplev
Date:
On Tue, Jan 7, 2014 at 8:16 AM, Sameer Kumar <sameer.kumar@ashnik.com> wrote:
>> > promoted to a slave (this I have found out how to do automatically). How
>> > will the clients know to connect to the new master? Any ideas on this
>> > one?
>>
>> One of the ideas, and BTW my favorite method, is to use PgBouncer to
>> redirect queries from master to slave with pause. The full description
>> of the process is by the link below.
>>
>> https://github.com/grayhemp/pgcookbook/blob/master/switching_to_another_server_with_pgbouncer.md
>
> I checked it out. Looks like the process involves manual identification of
> failure(?). But this is a nice method for manual switchovers.

Correct, this is about manual detection of failure. There were a lot
of discussions around automatic failover but I have never seen anyone
defined a really working (for me) criteria to detect the failure.

> As a suggestion, won't it be a good idea to run the pgbouncer on Application
> Server [so that even if the Master Server has to be shutdown, IP address
> changes or starting pgbouncer on slave is not needed]?

It depends. In some cases it is worth doing if you have a single
application server (and do not planning to have more), or the overhead
of a high number of app->db network connections per second is
significantly high in comparison to a local one. However, modern
architectures usually use multiple application servers and persistent
connections, and in this case installing pgbouncer on database servers
is preferable. I also do not recommend to install it on a separate
machine as it brings another point of failure in any case. And again,
it depends on your situation.

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray.ru@gmail.com


Re: Master-slave failover question

From
Sameer Kumar
Date:

On Wed, Jan 8, 2014 at 1:55 AM, Sergey Konoplev <gray.ru@gmail.com> wrote:
> I checked it out. Looks like the process involves manual identification of
> failure(?). But this is a nice method for manual switchovers.

Correct, this is about manual detection of failure. There were a lot
of discussions around automatic failover but
I have never seen anyone
defined a really working (for me) criteria to detect the failure.


I guess there are few aspects involved: Auto Detection of Failure, Auto-promotion of slave to master and automatic switch of application to point to New Master.

- Auto-detection of Failure: This can be tricky and may lead to split-brain scenario when both the nodes think they have lost there fellow node and Slave promotes itself as master and Master. If you have a proper log monitoring in place, this can be detected and you can bring things back in shape. I agree to your point "
 I have never seen anyone defined a really working (for me) criteria to detect the failure.
"

- Auto-promotion of slave to master: On failure detection you can use a shell script/ssh to create trigger file which shall promote the slave

- Auto switch of Application to New Master: You can use Virtual IP or use a middle-tier e.g. pgbouncer on a different node

- SO far I have used pgpool in HA mode to cater to all these points. I have run into split brain scenario but once detected, you just need to rebuild slave and attach it back. Despite split brain my application continues to use original master (since Virtual IP has not shifted). 

- Even if Virtual IP shifts my application would connect to slave (not promoted to a standalone master), which I can live with.

- You will run into an issue if you fail to notice split brain scenario. Later when you actually loose your master you will face a problem (you won't have a slave to failover to). 

- If the scenario is of a DR site, I would not recommend auto-failovers. The setup I am talking about is HA setup and I am using synchronous streaming replication. 

- In same setup I have a DR site, which is using file based log shipping (archive restore) and has band width constraints. The replication is not synchronous.

- In DR there could be a data/transaction loss while switching 2nd Node and hence IMHO decision and process should be manual.

 
> As a suggestion, won't it be a good idea to run the pgbouncer on Application
> Server [so that even if the Master Server has to be shutdown, IP address
> changes or starting pgbouncer on slave is not needed]?

It depends. In some cases it is worth doing if you have a single
application server (and do not planning to have more), or the overhead
of a high number of app->db network connections per second is
significantly high in comparison to a local one.

I can't recollect where but I read recommended setup for pgbouncer to be installed away from db server (to avoid process contention and few other points were mentioned which I can not exactly recollect). I have seen/done setups which has pgbouncer running on dbserver and they seem to holding fine.
Recently I had done a benchmark of running pgbouncer on to Client (windows machine running HammerDB as benchmark tool) and then tried to run it on db server. There was not much of a difference. 

However, modern
architectures usually use multiple application servers and persistent
connections, and in this case installing pgbouncer on database servers
is preferable.
I guess one can always setup multiple pgbouncer (one on each application node) (?).

 
I also do not recommend to install it on a separate
machine as it brings another point of failure in any case.
Agree
And again,
it depends on your situation.
That's fair




Best Regards,
Sameer Kumar | Database Consultant
ASHNIK PTE. LTD.
101 Cecil Street, #11-11 Tong Eng Building, Singapore 069533
M : +65 8110 0350 T: +65 6438 3504 | www.ashnik.com
www.facebook.com/ashnikbiz | www.twitter.com/ashnikbiz

email patch

This email may contain confidential, privileged or copyright material and is solely for the use of the intended recipient(s).

Attachment

Re: Master-slave failover question

From
Jayadevan
Date:
Sameer Kumar wrote
> - SO far I have used pgpool in HA mode to cater to all these points. I
> have
> run into split brain scenario but once detected, you just need to rebuild
> slave and attach it back. Despite split brain my application continues to
> use original master (since Virtual IP has not shifted).

Just curious - once pgpool switches to the slave, it will mark it as primary
(I did not use virtual IPs.) So spli-brain scenarion will not happen,right?




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Master-slave-failover-question-tp5785199p5785808.html
Sent from the PostgreSQL - novice mailing list archive at Nabble.com.


Re: Master-slave failover question

From
Sameer Kumar
Date:


On Sat, Jan 4, 2014 at 12:03 AM, Athanasios | ZenGuard <athanasios@zenguard.org> wrote:
Hi Sameer,
thank you for the quick reply (again!). I am not using pgpool admin so is it possible to share the shell script that does the rebuilding? 

For failover detection and failover the script should be as simple as:

        Check the failing node
               if failing_node=primary_node
                    create trigger file on secondary;
               end if;
               
               if failing_node=secondary_node
                     set synchornous replication off;
              end if;



The logic for rebuilding was simple (if I remember it correctly):

              node_to_rebuild=node_2;
              active_node=node_1;              

              node_to_rebuild=$1;    //pass this as a parameter

              if node_1=node_to_rebuild
                   then active_node=node_2;     
             end if;
               
              copy data directory from active_node to node_to_rebuild;
              start postgresql on node_to_rebuild;
              use ppc_attach_node to attach newly created node;

You need to use ssh (remote command execution) for some of these commands and make sure that ssh can login without password prompt to both the nodes. If you are going to use pgpool in HA mode, then keep a copy of this rebuilding script on each node so that you can rebuild any node from any node. 

With pgpoolAdmin the process must be much more simpler, but I could never get it installed. :-(


Re: Master-slave failover question

From
Sameer Kumar
Date:

> - SO far I have used pgpool in HA mode to cater to all these points. I
> have
> run into split brain scenario but once detected, you just need to rebuild
> slave and attach it back. Despite split brain my application continues to
> use original master (since Virtual IP has not shifted).

Just curious - once pgpool switches to the slave, it will mark it as primary
(I did not use virtual IPs.) So spli-brain scenarion will not happen,right?




Are you using pgpool in HA mode or you are using pgpool only on Primary/Secondary server? Or using pgpool on a third server?

If you have only one pgpool talking to both master and slave and it performs a failover that should not suffer from split-brain. But yeah, it may at some point think (e.g. in case of a network failure) that both the nodes are lost [esp if it is running on a third server]. 

I can not think of all the scenarios and hence can not rule out a split brain situation (but I think it is not possible as long as you have only one pgpool).

Re: Master-slave failover question

From
Sameer Kumar
Date:
CCin the community mailing list
On Wed, Jan 8, 2014 at 3:02 PM, Jayadevan M <maymala.jayadevan@gmail.com> wrote:
I am using pgpool with streaming replication option for PostgreSQL, with a failover command (shell script). 2 pgpool instances (with watchdog) on the database nodes. You were referring to split brain for pgpool instances or PostgreSQL instances?



I was talking about split-brain for pgpool which cascades to respective PostgreSQL instances i.e. pgpool would think that the other node is lost and hence it assume itself as only available pgpool+DB node and hence pgpool on slave will switchover to its on DB and pgpool on master will consider slave to be lost. This can happen if there is a network fluctuation between slave and master (can be avoided with redundant network path/line between two servers).

I do not understand if you are using HA mode, how come you are not using Virtual IP (you mentioned earlier). 
Just curious - once pgpool switches to the slave, it will mark it as primary
(I did not use virtual IPs.) So spli-brain scenarion will not happen,right?
Am I missing something here? or some mode of pgpool which I am not aware of?

 

Re: Master-slave failover question

From
Jayadevan
Date:
Sameer Kumar wrote
>  Am I missing something here? or some mode of pgpool which I am not aware
> of?

I thought you were mentioning split-brain for PostgreSQL. For pgpool, I am
using virtual IP. Somehow I got the message that the split-brain/virtual IPs
in your mail were with reference to PostgreSQL servers.




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Master-slave-failover-question-tp5785199p5785816.html
Sent from the PostgreSQL - novice mailing list archive at Nabble.com.


Re: Master-slave failover question

From
Sameer Kumar
Date:

On Wed, Jan 8, 2014 at 3:58 PM, Jayadevan <maymala.jayadevan@gmail.com> wrote:
Sameer Kumar wrote
>  Am I missing something here? or some mode of pgpool which I am not aware
> of?

I thought you were mentioning split-brain for PostgreSQL. For pgpool, I am
using virtual IP. Somehow I got the message that the split-brain/virtual IPs
in your mail were with reference to PostgreSQL servers.

It is. Once pgpool faces a split-brain scenario, it will pass on the same knowledge to respective PostgreSQL nodes. Try it out, the pull the network cable or bring down your router [don't do it in your production :-) ] then check you logs on both database, check your pgpool logs. You will know what I am talking about.

 



--
View this message in context: http://postgresql.1045698.n5.nabble.com/Master-slave-failover-question-tp5785199p5785816.html
Sent from the PostgreSQL - novice mailing list archive at Nabble.com.


--
Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-novice

Re: Master-slave failover question

From
Sameer Kumar
Date:
I am not sure if this is still relevant for the person posting the question. But I guess this can benefit others who browse through the archives looking for a similar solution. Here is a link to a nice tutorial with pgpool:



Though it looks like the pgpool in this tutorial is running on a different server (and hence pgpool becomes a single point of failure). 

Re: Master-slave failover question

From
"gurunadh.venkata@gmail.com"
Date:
Hi all,

I have setup the load balancing and failover for High Availability in
pgpool. I
 have used the Master-slave replication(stream mode).

 DB version PPAS 9.4AS(EnterpriseDB)

 pgpool   version 3.3.4

 The following is my failover command in pgpool.conf

 failover_command ='/usr/local/etc/failover.sh %d %P %H
 %R'

 Following is the failover script.

 #!/bin/bash -x
 FALLING_NODE=$1         # %d
 OLDPRIMARY_NODE=$2      # %P
 NEW_PRIMARY=$3          # %H
 PGDATA=$4               # %R

 if [ $FALLING_NODE = $OLDPRIMARY_NODE ]; then
     if [ $UID -eq 0 ]
     then
         su enterprisedb -c "ssh -T enterprisedb@$NEW_PRIMARY touch
 $PGDATA/trigger"
     else
         ssh -T enterprisedb@$NEW_PRIMARY touch $PGDATA/trigger
     fi
     exit 0;
 fi;
 exit 0;

 To check the  failover scenario i have stopped(kill -9) the master db
 server process.According to the script the present salve will take over the
 role of master. But the same was not happening in my case.

From pgpool end

 show pool_nodes;

 o/p:

 edb=# show pool_nodes;
  node_id |  hostname  | port | status | lb_weight |  role
 ---------+------------+------+--------+-----------+---------
  0       | 10.0.0.149 | 5444 | 3      | 0.500000  | standby
  1       | 10.0.0.158 | 5444 | 2      | 0.500000  | standby
 (2 rows)


 Please help me in Fixing this Issue.Provide me the failover script which is
working correctly.

 Thanks for your time.



--
View this message in context: http://postgresql.nabble.com/Master-slave-failover-question-tp5785199p5841392.html
Sent from the PostgreSQL - novice mailing list archive at Nabble.com.