Re: Replication failure, slave requesting old segments - Mailing list pgsql-general

From Phil Endecott
Subject Re: Replication failure, slave requesting old segments
Date
Msg-id 1534101938762@dmwebmail.dmwebmail.chezphil.org
Whole thread Raw
In response to Re: Replication failure, slave requesting old segments  (Adrian Klaver <adrian.klaver@aklaver.com>)
Responses Re: Replication failure, slave requesting old segments
List pgsql-general
Hi Adrian,

Adrian Klaver wrote:
> On 08/11/2018 12:42 PM, Phil Endecott wrote:
>> Hi Adrian,
>> 
>> Adrian Klaver wrote:
>>> Looks like the master recycled the WAL's while the slave could not 
>>> connect.
>> 
>> Yes but... why is that a problem?  The master is copying the WALs to
>> the backup server using scp, where they remain forever.  The slave gets
>
> To me it looks like that did not happen:
>
> 2018-08-11 00:05:50.364 UTC [615] LOG:  restored log file 
> "0000000100000007000000D0" from archive
> scp: backup/postgresql/archivedir/0000000100000007000000D1: No such file 
> or directory
> 2018-08-11 00:05:51.325 UTC [7208] LOG:  started streaming WAL from 
> primary at 7/D0000000 on timeline 1
> 2018-08-11 00:05:51.325 UTC [7208] FATAL:  could not receive data from 
> WAL stream: ERROR:  requested WAL segment 0000000100000007000000D0 has 
> already been removed
>
> Above 0000000100000007000000D0 is gone/recycled on the master and the 
> archived version does not seem to be complete as the streaming 
> replication is trying to find it.

The files on the backup server were all 16 MB.


> Below you kick the master and it coughs up the files to the archive 
> including *D0 and *D1 on up to *D4 and then the streaming picks using *D5.

When I kicked it, the master wrote D1 to D4 to the backup.  It did not
change D0 (its modification time on the backup is from before the "kick").
The slave re-read D0, again, as it had been doing throughout this period,
and then read D1 to D4.


> Best guess is the archiving did not work as expected during:
>
> "(During this time the master was also down for a shorter period.)"

Around the time the master was down, the WAL segment names were CB and CC.
Files CD to CF were written between the master coming up and the slave
coming up.  The slave had no trouble restoring those segments when it started.
The problematic segments D0 and D1 were the ones that were "current" 
when the
slave restarted, at which time the master was up consistently.


Regards, Phil.






pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: PostgreSQL C Language Extension with C++ Code
Next
From: "Phil Endecott"
Date:
Subject: Re: Replication failure, slave requesting old segments