Thread: Postgres HA

Postgres HA

From
Azimuddin Mohammed
Date:
Hello, 
I am little confused with how HA works in postgres. Reading the article which state as below "If the primary server fails and the standby server becomes the new primary, and then the old primary restarts, you must have a mechanism for informing the old primary that it is no longer the primary. This is sometimes known as STONITH (Shoot The Other Node In The Head), which is necessary to avoid situations where both systems think they are the primary, which will lead to confusion and ultimately data loss.

Many failover systems use just two systems, the primary and the standby, connected by some kind of heartbeat mechanism to continually verify the connectivity between the two and the viability of the primary. It is also possible to use a third system (called a witness server) to prevent some cases of inappropriate failover, but the additional complexity might not be worthwhile unless it is set up with sufficient care and rigorous testing.

PostgreSQL does not provide the system software required to identify a failure on the primary and notify the standby database server. Many such tools exist and are well integrated with the operating system facilities required for successful failover, such as IP address migration."

Can someone explain how the HA failback will take place and what open source tools we can use to make sure once the primary server which failed over to slave will mark itself as slave. 

Appreciate your response in advance. 

--

Regards,
Azim


Virus-free. www.avast.com

Re: Postgres HA

From
Scott Marlowe
Date:
On Fri, Jan 5, 2018 at 12:07 PM, Azimuddin Mohammed <azimeiu@gmail.com> wrote:
>
> Hello,
> I am little confused with how HA works in postgres. Reading the article which state as below "If the primary server
failsand the standby server becomes the new primary, and then the old primary restarts, you must have a mechanism for
informingthe old primary that it is no longer the primary. This is sometimes known as STONITH (Shoot The Other Node In
TheHead), which is necessary to avoid situations where both systems think they are the primary, which will lead to
confusionand ultimately data loss. 
>
> Many failover systems use just two systems, the primary and the standby, connected by some kind of heartbeat
mechanismto continually verify the connectivity between the two and the viability of the primary. It is also possible
touse a third system (called a witness server) to prevent some cases of inappropriate failover, but the additional
complexitymight not be worthwhile unless it is set up with sufficient care and rigorous testing. 
>
> PostgreSQL does not provide the system software required to identify a failure on the primary and notify the standby
databaseserver. Many such tools exist and are well integrated with the operating system facilities required for
successfulfailover, such as IP address migration." 
>
> Can someone explain how the HA failback will take place and what open source tools we can use to make sure once the
primaryserver which failed over to slave will mark itself as slave. 
>

There are LOTS of ways to implement HA.

Here's a book on the subject that's 537 pages long, and is only $4.99 right now:
https://www.packtpub.com/big-data-and-business-intelligence/postgresql-high-availability-cookbook-second-edition
I've been reading it a bit, seems to be a good resource.


--
To understand recursion, one must first understand recursion.


Re: Postgres HA

From
Scott Marlowe
Date:
On Fri, Jan 5, 2018 at 12:07 PM, Azimuddin Mohammed <azimeiu@gmail.com> wrote:
>
> Hello,
> I am little confused with how HA works in postgres. Reading the article which state as below "If the primary server
failsand the standby server becomes the new primary, and then the old primary restarts, you must have a mechanism for
informingthe old primary that it is no longer the primary. This is sometimes known as STONITH (Shoot The Other Node In
TheHead), which is necessary to avoid situations where both systems think they are the primary, which will lead to
confusionand ultimately data loss. 
>
> Many failover systems use just two systems, the primary and the standby, connected by some kind of heartbeat
mechanismto continually verify the connectivity between the two and the viability of the primary. It is also possible
touse a third system (called a witness server) to prevent some cases of inappropriate failover, but the additional
complexitymight not be worthwhile unless it is set up with sufficient care and rigorous testing. 
>
> PostgreSQL does not provide the system software required to identify a failure on the primary and notify the standby
databaseserver. Many such tools exist and are well integrated with the operating system facilities required for
successfulfailover, such as IP address migration." 
>
> Can someone explain how the HA failback will take place and what open source tools we can use to make sure once the
primaryserver which failed over to slave will mark itself as slave. 
>

There are LOTS of ways to implement HA.

Here's a book on the subject that's 537 pages long, and is only $4.99 right now:
https://www.packtpub.com/big-data-and-business-intelligence/postgresql-high-availability-cookbook-second-edition
I've been reading it a bit, seems to be a good resource.


--
To understand recursion, one must first understand recursion.


Re: Postgres HA

From
Rui DeSousa
Date:
There are many different solutions; but I would recommend and use a least a three node cluster using synchronous
replicationwhere one of the nodes is acting as the witness — at a minimum (actual have more replicas).  The witness
nodeneed not be a full Postgres instance; it can also be achieved by using pg_recievexlog as the witness instead.  This
setupavoids a the split brain situation by making sure there is only one writable instance and that is the one that the
witnessis following. 

I use CARP (Common Address Redundancy Protocol) to manage the HA IP address (very platform specific); when the address
flipsover to a replica it automatically promotes the replica and witness follows the HA address thus signing off on the
selfpromotion. 

The old primary then needs to be reconfigured as a replia which can be done using pg_rewind or other numerous other
solutions;i.e. snapshots, backup, etc. 

I have not personally used this solution; but you could look at 2ndQuadrant repmgr product if you’re looking a packaged
solution.

https://www.2ndquadrant.com/en/resources/repmgr/installation-instructions/




Re: Postgres HA

From
Rui DeSousa
Date:
There are many different solutions; but I would recommend and use a least a three node cluster using synchronous
replicationwhere one of the nodes is acting as the witness — at a minimum (actual have more replicas).  The witness
nodeneed not be a full Postgres instance; it can also be achieved by using pg_recievexlog as the witness instead.  This
setupavoids a the split brain situation by making sure there is only one writable instance and that is the one that the
witnessis following. 

I use CARP (Common Address Redundancy Protocol) to manage the HA IP address (very platform specific); when the address
flipsover to a replica it automatically promotes the replica and witness follows the HA address thus signing off on the
selfpromotion. 

The old primary then needs to be reconfigured as a replia which can be done using pg_rewind or other numerous other
solutions;i.e. snapshots, backup, etc. 

I have not personally used this solution; but you could look at 2ndQuadrant repmgr product if you’re looking a packaged
solution.

https://www.2ndquadrant.com/en/resources/repmgr/installation-instructions/




Re: Postgres HA

From
"Jehan-Guillaume (ioguix) de Rorthais"
Date:
On Fri, 5 Jan 2018 13:07:10 -0600
Azimuddin Mohammed <azimeiu@gmail.com> wrote:

> Hello,
> I am little confused with how HA works in postgres. Reading the article
> which state as below "*If the primary server fails and the standby server
> becomes the new primary, and then the old primary restarts, you must have a
> mechanism for informing the old primary that it is no longer the primary.
> This is sometimes known as STONITH (Shoot The Other Node In The Head),
> which is necessary to avoid situations where both systems think they are
> the primary, which will lead to confusion and ultimately data loss.*
> 
> *Many failover systems use just two systems, the primary and the standby,
> connected by some kind of heartbeat mechanism to continually verify the
> connectivity between the two and the viability of the primary. It is also
> possible to use a third system (called a witness server) to prevent some
> cases of inappropriate failover, but the additional complexity might not be
> worthwhile unless it is set up with sufficient care and rigorous testing.*
> *PostgreSQL does not provide the system software required to identify a
> failure on the primary and notify the standby database server. Many such
> tools exist and are well integrated with the operating system facilities
> required for successful failover, such as IP address migration."*
> 
> Can someone explain how the HA failback will take place

The failback need either to rebuild the old master as a standby (rsync,
pg_basebackup, restore PITR, ...) or to use pg_rewind to rewind the old master
to a point where it can catch up with the new master.

Some tools tries to automate failback using pg_rewind (patroni, repmgr), but I
have no experience with them.

> and what open source tools we can use to make sure once the primary server
> which failed over to slave will mark itself as slave.

There's a lot of open source tools to build some HA around PgSQL: Repmgr,
Patroni (based on etcd or zookeeper), PAF (based on Pacemaker), etc. You will
have to spend a lot of time to make extensive tests, understand them, pick one
and document your cluster.

Regards,


Re: Postgres HA

From
"Jehan-Guillaume (ioguix) de Rorthais"
Date:
On Fri, 5 Jan 2018 13:07:10 -0600
Azimuddin Mohammed <azimeiu@gmail.com> wrote:

> Hello,
> I am little confused with how HA works in postgres. Reading the article
> which state as below "*If the primary server fails and the standby server
> becomes the new primary, and then the old primary restarts, you must have a
> mechanism for informing the old primary that it is no longer the primary.
> This is sometimes known as STONITH (Shoot The Other Node In The Head),
> which is necessary to avoid situations where both systems think they are
> the primary, which will lead to confusion and ultimately data loss.*
> 
> *Many failover systems use just two systems, the primary and the standby,
> connected by some kind of heartbeat mechanism to continually verify the
> connectivity between the two and the viability of the primary. It is also
> possible to use a third system (called a witness server) to prevent some
> cases of inappropriate failover, but the additional complexity might not be
> worthwhile unless it is set up with sufficient care and rigorous testing.*
> *PostgreSQL does not provide the system software required to identify a
> failure on the primary and notify the standby database server. Many such
> tools exist and are well integrated with the operating system facilities
> required for successful failover, such as IP address migration."*
> 
> Can someone explain how the HA failback will take place

The failback need either to rebuild the old master as a standby (rsync,
pg_basebackup, restore PITR, ...) or to use pg_rewind to rewind the old master
to a point where it can catch up with the new master.

Some tools tries to automate failback using pg_rewind (patroni, repmgr), but I
have no experience with them.

> and what open source tools we can use to make sure once the primary server
> which failed over to slave will mark itself as slave.

There's a lot of open source tools to build some HA around PgSQL: Repmgr,
Patroni (based on etcd or zookeeper), PAF (based on Pacemaker), etc. You will
have to spend a lot of time to make extensive tests, understand them, pick one
and document your cluster.

Regards,


Re: Postgres HA

From
John Scalia
Date:
What he said, and you also may want to look at pgpool-II. I’ve had fairly good luck with that and Tatsuo (the author)
hangsout here occasionally too. 
—
Jay

Sent from my iPad

> On Jan 5, 2018, at 4:00 PM, Jehan-Guillaume (ioguix) de Rorthais <ioguix@free.fr> wrote:
>
> On Fri, 5 Jan 2018 13:07:10 -0600
> Azimuddin Mohammed <azimeiu@gmail.com> wrote:
>
>> Hello,
>> I am little confused with how HA works in postgres. Reading the article
>> which state as below "*If the primary server fails and the standby server
>> becomes the new primary, and then the old primary restarts, you must have a
>> mechanism for informing the old primary that it is no longer the primary.
>> This is sometimes known as STONITH (Shoot The Other Node In The Head),
>> which is necessary to avoid situations where both systems think they are
>> the primary, which will lead to confusion and ultimately data loss.*
>>
>> *Many failover systems use just two systems, the primary and the standby,
>> connected by some kind of heartbeat mechanism to continually verify the
>> connectivity between the two and the viability of the primary. It is also
>> possible to use a third system (called a witness server) to prevent some
>> cases of inappropriate failover, but the additional complexity might not be
>> worthwhile unless it is set up with sufficient care and rigorous testing.*
>> *PostgreSQL does not provide the system software required to identify a
>> failure on the primary and notify the standby database server. Many such
>> tools exist and are well integrated with the operating system facilities
>> required for successful failover, such as IP address migration."*
>>
>> Can someone explain how the HA failback will take place
>
> The failback need either to rebuild the old master as a standby (rsync,
> pg_basebackup, restore PITR, ...) or to use pg_rewind to rewind the old master
> to a point where it can catch up with the new master.
>
> Some tools tries to automate failback using pg_rewind (patroni, repmgr), but I
> have no experience with them.
>
>> and what open source tools we can use to make sure once the primary server
>> which failed over to slave will mark itself as slave.
>
> There's a lot of open source tools to build some HA around PgSQL: Repmgr,
> Patroni (based on etcd or zookeeper), PAF (based on Pacemaker), etc. You will
> have to spend a lot of time to make extensive tests, understand them, pick one
> and document your cluster.
>
> Regards,
>


Re: Postgres HA

From
John Scalia
Date:
What he said, and you also may want to look at pgpool-II. I’ve had fairly good luck with that and Tatsuo (the author)
hangsout here occasionally too. 
—
Jay

Sent from my iPad

> On Jan 5, 2018, at 4:00 PM, Jehan-Guillaume (ioguix) de Rorthais <ioguix@free.fr> wrote:
>
> On Fri, 5 Jan 2018 13:07:10 -0600
> Azimuddin Mohammed <azimeiu@gmail.com> wrote:
>
>> Hello,
>> I am little confused with how HA works in postgres. Reading the article
>> which state as below "*If the primary server fails and the standby server
>> becomes the new primary, and then the old primary restarts, you must have a
>> mechanism for informing the old primary that it is no longer the primary.
>> This is sometimes known as STONITH (Shoot The Other Node In The Head),
>> which is necessary to avoid situations where both systems think they are
>> the primary, which will lead to confusion and ultimately data loss.*
>>
>> *Many failover systems use just two systems, the primary and the standby,
>> connected by some kind of heartbeat mechanism to continually verify the
>> connectivity between the two and the viability of the primary. It is also
>> possible to use a third system (called a witness server) to prevent some
>> cases of inappropriate failover, but the additional complexity might not be
>> worthwhile unless it is set up with sufficient care and rigorous testing.*
>> *PostgreSQL does not provide the system software required to identify a
>> failure on the primary and notify the standby database server. Many such
>> tools exist and are well integrated with the operating system facilities
>> required for successful failover, such as IP address migration."*
>>
>> Can someone explain how the HA failback will take place
>
> The failback need either to rebuild the old master as a standby (rsync,
> pg_basebackup, restore PITR, ...) or to use pg_rewind to rewind the old master
> to a point where it can catch up with the new master.
>
> Some tools tries to automate failback using pg_rewind (patroni, repmgr), but I
> have no experience with them.
>
>> and what open source tools we can use to make sure once the primary server
>> which failed over to slave will mark itself as slave.
>
> There's a lot of open source tools to build some HA around PgSQL: Repmgr,
> Patroni (based on etcd or zookeeper), PAF (based on Pacemaker), etc. You will
> have to spend a lot of time to make extensive tests, understand them, pick one
> and document your cluster.
>
> Regards,
>


Re: Postgres HA

From
Tatsuo Ishii
Date:
Hi,

Yes, definitely I am hanging out here.

If you have more specific questions to Pgpool-II, you are encouraged
to be subscribed to the Pgpool-II mailing list.
https://www.pgpool.net/mailman/listinfo/pgpool-general

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

> What he said, and you also may want to look at pgpool-II. I’ve had fairly good luck with that and Tatsuo (the author)
hangsout here occasionally too.
 
> ―
> Jay
> 
> Sent from my iPad
> 
>> On Jan 5, 2018, at 4:00 PM, Jehan-Guillaume (ioguix) de Rorthais <ioguix@free.fr> wrote:
>> 
>> On Fri, 5 Jan 2018 13:07:10 -0600
>> Azimuddin Mohammed <azimeiu@gmail.com> wrote:
>> 
>>> Hello,
>>> I am little confused with how HA works in postgres. Reading the article
>>> which state as below "*If the primary server fails and the standby server
>>> becomes the new primary, and then the old primary restarts, you must have a
>>> mechanism for informing the old primary that it is no longer the primary.
>>> This is sometimes known as STONITH (Shoot The Other Node In The Head),
>>> which is necessary to avoid situations where both systems think they are
>>> the primary, which will lead to confusion and ultimately data loss.*
>>> 
>>> *Many failover systems use just two systems, the primary and the standby,
>>> connected by some kind of heartbeat mechanism to continually verify the
>>> connectivity between the two and the viability of the primary. It is also
>>> possible to use a third system (called a witness server) to prevent some
>>> cases of inappropriate failover, but the additional complexity might not be
>>> worthwhile unless it is set up with sufficient care and rigorous testing.*
>>> *PostgreSQL does not provide the system software required to identify a
>>> failure on the primary and notify the standby database server. Many such
>>> tools exist and are well integrated with the operating system facilities
>>> required for successful failover, such as IP address migration."*
>>> 
>>> Can someone explain how the HA failback will take place
>> 
>> The failback need either to rebuild the old master as a standby (rsync,
>> pg_basebackup, restore PITR, ...) or to use pg_rewind to rewind the old master
>> to a point where it can catch up with the new master.
>> 
>> Some tools tries to automate failback using pg_rewind (patroni, repmgr), but I
>> have no experience with them.
>> 
>>> and what open source tools we can use to make sure once the primary server
>>> which failed over to slave will mark itself as slave.
>> 
>> There's a lot of open source tools to build some HA around PgSQL: Repmgr,
>> Patroni (based on etcd or zookeeper), PAF (based on Pacemaker), etc. You will
>> have to spend a lot of time to make extensive tests, understand them, pick one
>> and document your cluster.
>> 
>> Regards,
>> 
> 


Re: Postgres HA

From
Tatsuo Ishii
Date:
Hi,

Yes, definitely I am hanging out here.

If you have more specific questions to Pgpool-II, you are encouraged
to be subscribed to the Pgpool-II mailing list.
https://www.pgpool.net/mailman/listinfo/pgpool-general

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

> What he said, and you also may want to look at pgpool-II. I’ve had fairly good luck with that and Tatsuo (the author)
hangsout here occasionally too.
 
> ―
> Jay
> 
> Sent from my iPad
> 
>> On Jan 5, 2018, at 4:00 PM, Jehan-Guillaume (ioguix) de Rorthais <ioguix@free.fr> wrote:
>> 
>> On Fri, 5 Jan 2018 13:07:10 -0600
>> Azimuddin Mohammed <azimeiu@gmail.com> wrote:
>> 
>>> Hello,
>>> I am little confused with how HA works in postgres. Reading the article
>>> which state as below "*If the primary server fails and the standby server
>>> becomes the new primary, and then the old primary restarts, you must have a
>>> mechanism for informing the old primary that it is no longer the primary.
>>> This is sometimes known as STONITH (Shoot The Other Node In The Head),
>>> which is necessary to avoid situations where both systems think they are
>>> the primary, which will lead to confusion and ultimately data loss.*
>>> 
>>> *Many failover systems use just two systems, the primary and the standby,
>>> connected by some kind of heartbeat mechanism to continually verify the
>>> connectivity between the two and the viability of the primary. It is also
>>> possible to use a third system (called a witness server) to prevent some
>>> cases of inappropriate failover, but the additional complexity might not be
>>> worthwhile unless it is set up with sufficient care and rigorous testing.*
>>> *PostgreSQL does not provide the system software required to identify a
>>> failure on the primary and notify the standby database server. Many such
>>> tools exist and are well integrated with the operating system facilities
>>> required for successful failover, such as IP address migration."*
>>> 
>>> Can someone explain how the HA failback will take place
>> 
>> The failback need either to rebuild the old master as a standby (rsync,
>> pg_basebackup, restore PITR, ...) or to use pg_rewind to rewind the old master
>> to a point where it can catch up with the new master.
>> 
>> Some tools tries to automate failback using pg_rewind (patroni, repmgr), but I
>> have no experience with them.
>> 
>>> and what open source tools we can use to make sure once the primary server
>>> which failed over to slave will mark itself as slave.
>> 
>> There's a lot of open source tools to build some HA around PgSQL: Repmgr,
>> Patroni (based on etcd or zookeeper), PAF (based on Pacemaker), etc. You will
>> have to spend a lot of time to make extensive tests, understand them, pick one
>> and document your cluster.
>> 
>> Regards,
>> 
> 


Re: Postgres HA

From
Azimuddin Mohammed
Date:
Thank you..I have subscribed to the list..


On Jan 5, 2018 8:09 PM, "Tatsuo Ishii" <ishii@sraoss.co.jp> wrote:
Hi,

Yes, definitely I am hanging out here.

If you have more specific questions to Pgpool-II, you are encouraged
to be subscribed to the Pgpool-II mailing list.
https://www.pgpool.net/mailman/listinfo/pgpool-general

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

> What he said, and you also may want to look at pgpool-II. I’ve had fairly good luck with that and Tatsuo (the author) hangs out here occasionally too.
> ―
> Jay
>
> Sent from my iPad
>
>> On Jan 5, 2018, at 4:00 PM, Jehan-Guillaume (ioguix) de Rorthais <ioguix@free.fr> wrote:
>>
>> On Fri, 5 Jan 2018 13:07:10 -0600
>> Azimuddin Mohammed <azimeiu@gmail.com> wrote:
>>
>>> Hello,
>>> I am little confused with how HA works in postgres. Reading the article
>>> which state as below "*If the primary server fails and the standby server
>>> becomes the new primary, and then the old primary restarts, you must have a
>>> mechanism for informing the old primary that it is no longer the primary.
>>> This is sometimes known as STONITH (Shoot The Other Node In The Head),
>>> which is necessary to avoid situations where both systems think they are
>>> the primary, which will lead to confusion and ultimately data loss.*
>>>
>>> *Many failover systems use just two systems, the primary and the standby,
>>> connected by some kind of heartbeat mechanism to continually verify the
>>> connectivity between the two and the viability of the primary. It is also
>>> possible to use a third system (called a witness server) to prevent some
>>> cases of inappropriate failover, but the additional complexity might not be
>>> worthwhile unless it is set up with sufficient care and rigorous testing.*
>>> *PostgreSQL does not provide the system software required to identify a
>>> failure on the primary and notify the standby database server. Many such
>>> tools exist and are well integrated with the operating system facilities
>>> required for successful failover, such as IP address migration."*
>>>
>>> Can someone explain how the HA failback will take place
>>
>> The failback need either to rebuild the old master as a standby (rsync,
>> pg_basebackup, restore PITR, ...) or to use pg_rewind to rewind the old master
>> to a point where it can catch up with the new master.
>>
>> Some tools tries to automate failback using pg_rewind (patroni, repmgr), but I
>> have no experience with them.
>>
>>> and what open source tools we can use to make sure once the primary server
>>> which failed over to slave will mark itself as slave.
>>
>> There's a lot of open source tools to build some HA around PgSQL: Repmgr,
>> Patroni (based on etcd or zookeeper), PAF (based on Pacemaker), etc. You will
>> have to spend a lot of time to make extensive tests, understand them, pick one
>> and document your cluster.
>>
>> Regards,
>>
>

Re: Postgres HA

From
Azimuddin Mohammed
Date:
Thank you..I have subscribed to the list..


On Jan 5, 2018 8:09 PM, "Tatsuo Ishii" <ishii@sraoss.co.jp> wrote:
Hi,

Yes, definitely I am hanging out here.

If you have more specific questions to Pgpool-II, you are encouraged
to be subscribed to the Pgpool-II mailing list.
https://www.pgpool.net/mailman/listinfo/pgpool-general

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

> What he said, and you also may want to look at pgpool-II. I’ve had fairly good luck with that and Tatsuo (the author) hangs out here occasionally too.
> ―
> Jay
>
> Sent from my iPad
>
>> On Jan 5, 2018, at 4:00 PM, Jehan-Guillaume (ioguix) de Rorthais <ioguix@free.fr> wrote:
>>
>> On Fri, 5 Jan 2018 13:07:10 -0600
>> Azimuddin Mohammed <azimeiu@gmail.com> wrote:
>>
>>> Hello,
>>> I am little confused with how HA works in postgres. Reading the article
>>> which state as below "*If the primary server fails and the standby server
>>> becomes the new primary, and then the old primary restarts, you must have a
>>> mechanism for informing the old primary that it is no longer the primary.
>>> This is sometimes known as STONITH (Shoot The Other Node In The Head),
>>> which is necessary to avoid situations where both systems think they are
>>> the primary, which will lead to confusion and ultimately data loss.*
>>>
>>> *Many failover systems use just two systems, the primary and the standby,
>>> connected by some kind of heartbeat mechanism to continually verify the
>>> connectivity between the two and the viability of the primary. It is also
>>> possible to use a third system (called a witness server) to prevent some
>>> cases of inappropriate failover, but the additional complexity might not be
>>> worthwhile unless it is set up with sufficient care and rigorous testing.*
>>> *PostgreSQL does not provide the system software required to identify a
>>> failure on the primary and notify the standby database server. Many such
>>> tools exist and are well integrated with the operating system facilities
>>> required for successful failover, such as IP address migration."*
>>>
>>> Can someone explain how the HA failback will take place
>>
>> The failback need either to rebuild the old master as a standby (rsync,
>> pg_basebackup, restore PITR, ...) or to use pg_rewind to rewind the old master
>> to a point where it can catch up with the new master.
>>
>> Some tools tries to automate failback using pg_rewind (patroni, repmgr), but I
>> have no experience with them.
>>
>>> and what open source tools we can use to make sure once the primary server
>>> which failed over to slave will mark itself as slave.
>>
>> There's a lot of open source tools to build some HA around PgSQL: Repmgr,
>> Patroni (based on etcd or zookeeper), PAF (based on Pacemaker), etc. You will
>> have to spend a lot of time to make extensive tests, understand them, pick one
>> and document your cluster.
>>
>> Regards,
>>
>