Missing WAL file after running pg_rewind - Mailing list pgsql-general

From Dylan Luong
Subject Missing WAL file after running pg_rewind
Date
Msg-id ab82d7fd35ef4394bc5dfc6a6e2f1266@ITUPW-EXMBOX3B.UniNet.unisa.edu.au
Whole thread Raw
Responses Re: Missing WAL file after running pg_rewind  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-general

Hi

 

We had a failover situation where our monitoring watchdog processes promoted the slave to become the new master.

I restarted the old master database to ensure a clean stop/start and performed pg_rewind on the old master to resync with the new master. However, after successful rewind, there was an error restarting the new slave.

The steps I took were:

1.       Stop all watchdogs

2.       Start/stop the old master

3.       Run ‘checkpoint’ on new master

4.       Run the pg_rewind on old master to resync with new master

5.       Start the old master (as new slave)

 

Step 4 pg_rewind was successful with the new slave rewind to the same new timeline of the new master, however during the restart of the new slave it failed to start with the following errors:

 

80) FATAL:  the database system is starting up

cp: cannot stat ‘/pg_backup/backup/archive_sync/0000000400000383000000BF’: No such file or directory

cp: cannot stat ‘/pg_backup/backup/archive_sync/0000000300000383000000BF’: No such file or directory

cp: cannot stat ‘/pg_backup/backup/archive_sync/0000000200000383000000BF’: No such file or directory

cp: cannot stat ‘/pg_backup/backup/archive_sync/0000000100000383000000BF’: No such file or directory

2018-01-11 23:21:59 ACDT [112235]: [1-1] db=,user= app=,host= LOG:  started streaming WAL from primary at

383/BE000000 on timeline 6

2018-01-11 23:21:59 ACDT [112235]: [2-1] db=,user= app=,host= FATAL:  could not receive data from WAL stre

am: ERROR:  requested WAL segment 0000000600000383000000BE has already been removed

 

I checked the both the archive and pg_xlog directories on the new master and cannot locate missing file.

 

Has anyone experience this before with pg_rewind?

 

The earliest wall files in the archive directory was around just after the failover occurred.

 

Eg, in the archive directory on the new Master:

$ ls -l

total 15745032

-rw-------. 1 postgres postgres 16777216 Jan 11 17:52 0000000500000383000000C0.partial

-rw-------. 1 postgres postgres 16777216 Jan 11 17:52 0000000600000383000000C0

-rw-------. 1 postgres postgres 16777216 Jan 11 17:52 0000000600000383000000C1

-rw-------. 1 postgres postgres 16777216 Jan 11 17:52 0000000600000383000000C2

-rw-------. 1 postgres postgres 16777216 Jan 11 17:52 0000000600000383000000C

 

And on the pg_xlog directory on the new Master:

-rw-------. 1 postgres postgres 16777216 Jan 11 18:57 000000060000038500000080

-rw-------. 1 postgres postgres 16777216 Jan 11 18:57 000000060000038500000081

-rw-------. 1 postgres postgres 16777216 Jan 11 18:57 000000060000038500000082

-rw-------. 1 postgres postgres 16777216 Jan 11 18:57 000000060000038500000083

-rw-------. 1 postgres postgres 16777216 Jan 11 18:57 000000060000038500000084

-rw-------. 1 postgres postgres 16777216 Jan 11 18:57 000000060000038500000085

-rw-------. 1 postgres postgres 16777216 Jan 11 18:57 000000060000038500000086

-rw-------. 1 postgres postgres 16777216 Jan 11 18:57 000000060000038500000087

 

Thanks

Dylan

 

pgsql-general by date:

Previous
From: Curt Tilmes
Date:
Subject: Multiple central connection service files
Next
From: "David G. Johnston"
Date:
Subject: Re: Multiple central connection service files