Re: Automatic failback - Mailing list pgsql-admin

From Asad Ali
Subject Re: Automatic failback
Date
Msg-id CAJ9xe=sr1j59XpXpwHpU4Ez12L=WHOwG=Vj1YH9WyBPEy8qWzQ@mail.gmail.com
Whole thread Raw
In response to Automatic failback  (Wasim Devale <wasimd60@gmail.com>)
List pgsql-admin
Hi Wasim,

To achieve automatic failback with minimal or zero downtime during disaster recovery (DR) using Barman and PostgreSQL in your Azure setup, here’s a high-level architecture and strategy you can follow:

1. Set up   Barman is in the Azure West Region to back up the PostgreSQL database from the Azure East Region. Use streaming replication to keep the DR database up-to-date with the primary database.
  • Primary Database: Configure continuous WAL streaming to the standby in the West region.
              (archive_mode = on, archive_command = 'barman-wal-archive').
  • Standby Database: Configure this as a hot standby (read-only), ready to be promoted in case of failover.  Configure it to receive WAL data via streaming replication.
2. Implement an automatic failover mechanism using a tool like Patroni or pg_auto_failover. These tools monitor the primary database and, in case of failure, automatically promote the standby database to the primary role.
  • Patroni: A cluster manager for PostgreSQL with high availability, automatically promoting a standby to primary when a failure is detected.
  • pg_auto_failover: Another option that provides automatic failover between primary and standby PostgreSQL databases, making sure the standby can seamlessly take over.
3. After recovery, once the primary database in the east region becomes available again, you need to set up automatic failback. Here’s how you can handle failback:
  • Step 1: Re-establish Streaming Replication: After promoting the DR database in the west region, reconfigure the primary in the east region as a standby. This can be done by setting up streaming replication from the promoted DR database (west) back to the original primary (east).

    • Reconfigure the old primary to become a replica of the new primary (which is the DR site in the west).
    • Barman can assist with this by restoring the latest backup and setting up WAL streaming to the original region.
  • Step 2: Reverse the Failover (Failback): Once the original region is stable, you can reverse the failover with zero downtime:

    • Stop write operations on the current primary (west).
    • Perform a controlled failover back to the original primary in the east, making it the new primary.
    • Reconfigure the DR site in the west region to again become a standby replica.

    This can be automated using Patroni or pg_auto_failover, ensuring seamless transitions between primary and standby without user intervention

4. To further minimize downtime during failback, you can use logical replication:

  • After failover, set up logical replication from the new primary (west) to the original primary (east) while the original primary is still functioning as a read-only standby.
  • Once logical replication has caught up, you can promote the original primary (east) with virtually no downtime, ensuring seamless failback.
This will ensure that your database is always available and that there is no downtime during a failover.

Let me know if you have any other questions.

Best regards,
Asad Ali

On Wed, Sep 18, 2024 at 5:17 PM Wasim Devale <wasimd60@gmail.com> wrote:
Hi All

I have barman tool in place and can any one suggest automatic failback with zero down time.

My PG database is hosted on Linux Red Hat 9. Our all Azure resources are on east region. We are planning to do DR disaster recovery in west region.

Thanks,
Wasim

pgsql-admin by date:

Previous
From: tiamoh m
Date:
Subject: Re: SSL Connection String
Next
From: Sabyasachi Mukherjee
Date:
Subject: Connecting Postgres SQL from Power BI