RE: Missing WAL file after running pg_rewind - Mailing list pgsql-general

From Dylan Luong
Subject RE: Missing WAL file after running pg_rewind
Date
Msg-id a7ad1502b60f4e4fae8ae9e8575b8e83@ITUPW-EXMBOX3B.UniNet.unisa.edu.au
Whole thread Raw
In response to Re: Missing WAL file after running pg_rewind  (Michael Paquier <michael.paquier@gmail.com>)
Responses Re: Missing WAL file after running pg_rewind  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-general
The file  exist in the archive directory of the old master but it is for the previous timeline, ie 5 and not 6, ie
0000000500000383000000BE.
Can I just rename the file to 6 timeline? Ie 0000000600000383000000BE

-----Original Message-----
From: Michael Paquier [mailto:michael.paquier@gmail.com]
Sent: Friday, 12 January 2018 12:08 PM
To: Dylan Luong <Dylan.Luong@unisa.edu.au>
Cc: pgsql-general@lists.postgresql.org
Subject: Re: Missing WAL file after running pg_rewind

On Thu, Jan 11, 2018 at 04:58:02PM +0000, Dylan Luong wrote:
> The steps I took were:
>
> 1.       Stop all watchdogs
>
> 2.       Start/stop the old master
>
> 3.       Run 'checkpoint' on new master
>
> 4.       Run the pg_rewind on old master to resync with new master
>
> 5.       Start the old master (as new slave)

That's a sane flow to me.

> 2018-01-11 23:21:59 ACDT [112235]: [2-1] db=,user= app=,host= FATAL:
> could not receive data from WAL stre
> am: ERROR:  requested WAL segment 0000000600000383000000BE has already
> been removed
>
> Has anyone experience this before with pg_rewind?

When restarting a standby after a rewind has been done to it, note that, in order to recover to a consistent point, it
needsto replay WAL from the previous checkpoint checkpoint where WAL has forked during the promotion up to the point
wherethe rewind has finished. Per your logs, I am getting that the previous checkpoint before the timeline jump is
locatedin segment 0000000X00000383000000BE, but this did not get archived. 

> The earliest wall files in the archive directory was around just after the failover occurred.
>
> Eg, in the archive directory on the new Master:
> $ ls -l
> total 15745032
> -rw-------. 1 postgres postgres 16777216 Jan 11 17:52
> 0000000500000383000000C0.partial -rw-------. 1 postgres postgres
> 16777216 Jan 11 17:52 0000000600000383000000C0 -rw-------. 1 postgres
> postgres 16777216 Jan 11 17:52 0000000600000383000000C1 -rw-------. 1
> postgres postgres 16777216 Jan 11 17:52 0000000600000383000000C2

Yeah, you are looking for the WAL segment just before the last, partial WAL segment of the previous timeline. Depending
onyour archiving strategy, I guess that you should have set archive_mode = 'always' so as the server which was the
standbybefore the promotion is also able to store them. 
--
Michael


pgsql-general by date:

Previous
From: pinker
Date:
Subject: Re: pg_basebackup is taking more time than expected
Next
From: armand pirvu
Date:
Subject: Re: characters converted to ??? in postgres