Re: [EXTERNAL] Re: PostgreSQL-12 replication failover, pg_rewindfails - Mailing list pgsql-general

From Mariya Rampurawala
Subject Re: [EXTERNAL] Re: PostgreSQL-12 replication failover, pg_rewindfails
Date
Msg-id 8BD51BB9-8695-4F10-8E9A-144D3F97059C@veritas.com
Whole thread Raw
In response to Re: PostgreSQL-12 replication failover, pg_rewind fails  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: [EXTERNAL] Re: PostgreSQL-12 replication failover, pg_rewindfails
List pgsql-general
Hi,

Thank you for the response.

    > but if the target cluster ran for a long time after the divergence,
    > the old WAL files might no longer be present. In that case, they can
    > be manually copied from the WAL archive to the pg_wal directory, or
    > fetched on startup by configuring primary_conninfo or restore_command.

I hit this issue every time I follow the aforementioned steps, manually as well as with scripts.
How long is "long time after divergence"? Is there a way I can make some configuration changes so that I don’t hit this
issue?
Is there anything I must change in my restore command?

===================================
primary_conninfo = 'user=replicator host=10.209.57.16 port=5432 sslmode=prefer sslcompression=0 gssencmode=prefer
krbsrvname=postgrestarget_session_attrs=any'
 
restore_command = 'scp  root@10.209.56.88:/pg_backup/%f %p'
===================================

Regards,
Mariya

On 12/05/20, 2:15 PM, "Kyotaro Horiguchi" <horikyota.ntt@gmail.com> wrote:

    Hello.
    
    At Tue, 12 May 2020 06:32:30 +0000, Mariya Rampurawala <Mariya.Rampurawala@veritas.com> wrote in 
    > I am working on providing HA for replication, using automation scripts.
    > My set up consists on two nodes, Master and Slave. When master fails, The slave is promoted to master. But when I
tryto re-register the old master as slave, the pg_rewind command fails. Details below.
 
    ...
    >   1.  Rewind again:
    >   2.  -bash-4.2$ /usr/pgsql-12/bin/pg_rewind -D /pg_mnt/pg-12/data --source-server="host=10.209.57.17  port=5432
user=postgresdbname=postgres"
 
    > 
    > pg_rewind: servers diverged at WAL location 6/B9FFFFD8 on timeline 53
    > 
    > pg_rewind: error: could not open file "/pg_mnt/pg-12/data/pg_wal/0000003500000006000000B9": No such file or
directory
    > 
    > pg_rewind: fatal: could not find previous WAL record at 6/B9FFFFD8
    > 
    > 
    > I have tried this multiple times but always face the same error. Can someone help me resolve this?
    
    As the error message is saying, required WAL file has been removed on
    the old master.  It is the normal behavior and described in the
    documentation.
    
    https://www.postgresql.org/docs/12/app-pgrewind.html
    
    > but if the target cluster ran for a long time after the divergence,
    > the old WAL files might no longer be present. In that case, they can
    > be manually copied from the WAL archive to the pg_wal directory, or
    > fetched on startup by configuring primary_conninfo or restore_command.
    
    So you seem to need to restore the required WAL files from archive or
    the current master.
    
    regards.
    
    -- 
    Kyotaro Horiguchi
    NTT Open Source Software Center
    


pgsql-general by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: PostgreSQL-12 replication failover, pg_rewind fails
Next
From: Kouber Saparev
Date:
Subject: pg_upgrade too slow on vacuum phase