Re: [GENERAL] Postgres HA - Mailing list pgsql-general

From dinesh kumar
Subject Re: [GENERAL] Postgres HA
Date
Msg-id CALnrH7rx0Hriz98UyaZTeyGh64jPbEYK20Mxm2txnhCX+rHCUw@mail.gmail.com
Whole thread Raw
In response to [GENERAL] Postgres HA  (Dylan Luong <Dylan.Luong@unisa.edu.au>)
List pgsql-general
Hi Dylan,

On Thu, Feb 23, 2017 at 4:28 AM, Dylan Luong <Dylan.Luong@unisa.edu.au> wrote:

Hi

 

I am a DBA at the University of South Australia. For PostgreSQL High Availability, we currently have setup a Master/Slave across two datacenters using PostgreSQL (WAL) streaming replication. We use an LTM (load balancer) server that sits between the application servers and the PostgreSQL server that directs connections to the Master (and the Slave if failover occurs). We also have watchdog processes on the PostgreSQL servers that polls the LTM to determine who is Master and perform automatic failover if required. I am looking at options to improve our high availability.

I would like to know how other organizations in different industries (other than education) setup High Availability on their PostgreSQL environments.


Below is the approach we have followed, to achieve the maximum HA with async streaming replication.

1. Create an instance "I" with "N" number of nodes.

2. Set up the replication among the "N" nodes as "N-1" nodes points to 1 master.

3. Configured 2 physical replication slots to each "N-1" nodes.

4. One replication slot is to receive the archives, and one is for doing the replication. (Suspected recovery process is slower than the receiver process)

5. Configured a parallel WAL uploader (Customized program) on master to wal-backup server. (We needed this for the PITR)

6. Implemented a quorum on "N-1" slaves as one should become as master in worst cases. (Guaranteed data availability as per RTO settings)

7. Watchdog process which update the pgbouncer configuration from master to the latest master.

8. Used consul service discovery for identifying the master, slave heart beats.

9. Once Failover is completed, "N-1"(including old master) follows the new master by doing a fresh refresh. (Planning to use pg_rewind)

10. Covered the split brain problems by removing the service discovery keys from consul. (It's delivering the promising results, but need to spend more time on this).

The above mentioned approach what we have done is similar to your's, but we needed to handle with multiple slaves rather single one. In case, if you are looking for any open source tools to implement in your production servers, then prefer to add repmgr, pgHA, PAF tools into your list. These open source tools are great and deliver the results as demonstrated.
 

What  tools do you use. Are they commercial licensed products? How is the architecture setup and how do you do recovery of new slave.

Your information is greatly appreciated.

 

Regards

Dylan

 

Dylan Luong

Information Strategy & Technology Services

University of South Australia

A Building, Room E2-07, Mawson Lakes Campus

MAWSON LAKES

South Australia  5095

 

Email:        dylan.luong@unisa.edu.au

Phone:    +61 8 83023629

Fax:         +61 8 83023577

WWW:    http://www.unisa.edu.au

 




--

pgsql-general by date:

Previous
From: Jehan-Guillaume de Rorthais
Date:
Subject: Re: [GENERAL] Postgres HA
Next
From: Francisco Olarte
Date:
Subject: Re: [GENERAL] Move rows from one database to other