Home > mailing lists

Re: [EXTERNAL] Re: PostgreSQL-12 replication failover, pg_rewindfails - Mailing list pgsql-general

From	Mariya Rampurawala
Subject	Re: [EXTERNAL] Re: PostgreSQL-12 replication failover, pg_rewindfails
Date	May 12, 2020 09:40:18
Msg-id	8BD51BB9-8695-4F10-8E9A-144D3F97059C@veritas.com Whole thread Raw
In response to	Re: PostgreSQL-12 replication failover, pg_rewind fails (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses	Re: [EXTERNAL] Re: PostgreSQL-12 replication failover, pg_rewindfails
List	pgsql-general

Tree view

Hi,

Thank you for the response.

    > but if the target cluster ran for a long time after the divergence,
    > the old WAL files might no longer be present. In that case, they can
    > be manually copied from the WAL archive to the pg_wal directory, or
    > fetched on startup by configuring primary_conninfo or restore_command.

I hit this issue every time I follow the aforementioned steps, manually as well as with scripts.
How long is "long time after divergence"? Is there a way I can make some configuration changes so that I don’t hit this
issue?
Is there anything I must change in my restore command?

===================================
primary_conninfo = 'user=replicator host=10.209.57.16 port=5432 sslmode=prefer sslcompression=0 gssencmode=prefer
krbsrvname=postgrestarget_session_attrs=any'
 
restore_command = 'scp  root@10.209.56.88:/pg_backup/%f %p'
===================================

Regards,
Mariya

On 12/05/20, 2:15 PM, "Kyotaro Horiguchi" <horikyota.ntt@gmail.com> wrote:

    Hello.
    
    At Tue, 12 May 2020 06:32:30 +0000, Mariya Rampurawala <Mariya.Rampurawala@veritas.com> wrote in 
    > I am working on providing HA for replication, using automation scripts.
    > My set up consists on two nodes, Master and Slave. When master fails, The slave is promoted to master. But when I
tryto re-register the old master as slave, the pg_rewind command fails. Details below.
 
    ...
    >   1.  Rewind again:
    >   2.  -bash-4.2$ /usr/pgsql-12/bin/pg_rewind -D /pg_mnt/pg-12/data --source-server="host=10.209.57.17  port=5432
user=postgresdbname=postgres"
 
    > 
    > pg_rewind: servers diverged at WAL location 6/B9FFFFD8 on timeline 53
    > 
    > pg_rewind: error: could not open file "/pg_mnt/pg-12/data/pg_wal/0000003500000006000000B9": No such file or
directory
    > 
    > pg_rewind: fatal: could not find previous WAL record at 6/B9FFFFD8
    > 
    > 
    > I have tried this multiple times but always face the same error. Can someone help me resolve this?
    
    As the error message is saying, required WAL file has been removed on
    the old master.  It is the normal behavior and described in the
    documentation.
    
    https://www.postgresql.org/docs/12/app-pgrewind.html
    
    > but if the target cluster ran for a long time after the divergence,
    > the old WAL files might no longer be present. In that case, they can
    > be manually copied from the WAL archive to the pg_wal directory, or
    > fetched on startup by configuring primary_conninfo or restore_command.
    
    So you seem to need to restore the required WAL files from archive or
    the current master.
    
    regards.
    
    -- 
    Kyotaro Horiguchi
    NTT Open Source Software Center

pgsql-general by date:

From: Kyotaro Horiguchi
Date: 12 May 2020, 08:45:07
Subject: Re: PostgreSQL-12 replication failover, pg_rewind fails

From: Kouber Saparev
Date: 12 May 2020, 12:27:49
Subject: pg_upgrade too slow on vacuum phase

Re: [EXTERNAL] Re: PostgreSQL-12 replication failover, pg_rewindfails - Mailing list pgsql-general

Previous

Next