Home > mailing lists

RE: Missing WAL file after running pg_rewind - Mailing list pgsql-general

From	Dylan Luong
Subject	RE: Missing WAL file after running pg_rewind
Date	January 13, 2018 03:44:25
Msg-id	a7ad1502b60f4e4fae8ae9e8575b8e83@ITUPW-EXMBOX3B.UniNet.unisa.edu.au Whole thread Raw
In response to	Re: Missing WAL file after running pg_rewind (Michael Paquier <michael.paquier@gmail.com>)
Responses	Re: Missing WAL file after running pg_rewind (Michael Paquier <michael.paquier@gmail.com>)
List	pgsql-general

Tree view

The file  exist in the archive directory of the old master but it is for the previous timeline, ie 5 and not 6, ie
0000000500000383000000BE.
Can I just rename the file to 6 timeline? Ie 0000000600000383000000BE

-----Original Message-----
From: Michael Paquier [mailto:michael.paquier@gmail.com]
Sent: Friday, 12 January 2018 12:08 PM
To: Dylan Luong <Dylan.Luong@unisa.edu.au>
Cc: pgsql-general@lists.postgresql.org
Subject: Re: Missing WAL file after running pg_rewind

On Thu, Jan 11, 2018 at 04:58:02PM +0000, Dylan Luong wrote:
> The steps I took were:
>
> 1.       Stop all watchdogs
>
> 2.       Start/stop the old master
>
> 3.       Run 'checkpoint' on new master
>
> 4.       Run the pg_rewind on old master to resync with new master
>
> 5.       Start the old master (as new slave)

That's a sane flow to me.

> 2018-01-11 23:21:59 ACDT [112235]: [2-1] db=,user= app=,host= FATAL:
> could not receive data from WAL stre
> am: ERROR:  requested WAL segment 0000000600000383000000BE has already
> been removed
>
> Has anyone experience this before with pg_rewind?

When restarting a standby after a rewind has been done to it, note that, in order to recover to a consistent point, it
needsto replay WAL from the previous checkpoint checkpoint where WAL has forked during the promotion up to the point
wherethe rewind has finished. Per your logs, I am getting that the previous checkpoint before the timeline jump is
locatedin segment 0000000X00000383000000BE, but this did not get archived. 

> The earliest wall files in the archive directory was around just after the failover occurred.
>
> Eg, in the archive directory on the new Master:
> $ ls -l
> total 15745032
> -rw-------. 1 postgres postgres 16777216 Jan 11 17:52
> 0000000500000383000000C0.partial -rw-------. 1 postgres postgres
> 16777216 Jan 11 17:52 0000000600000383000000C0 -rw-------. 1 postgres
> postgres 16777216 Jan 11 17:52 0000000600000383000000C1 -rw-------. 1
> postgres postgres 16777216 Jan 11 17:52 0000000600000383000000C2

Yeah, you are looking for the WAL segment just before the last, partial WAL segment of the previous timeline. Depending
onyour archiving strategy, I guess that you should have set archive_mode = 'always' so as the server which was the
standbybefore the promotion is also able to store them. 
--
Michael

pgsql-general by date:

From: pinker
Date: 12 January 2018, 22:05:11
Subject: Re: pg_basebackup is taking more time than expected

From: armand pirvu
Date: 13 January 2018, 09:11:36
Subject: Re: characters converted to ??? in postgres

RE: Missing WAL file after running pg_rewind - Mailing list pgsql-general

Previous

Next