Re: Replication failure, slave requesting old segments - Mailing list pgsql-general

From Adrian Klaver
Subject Re: Replication failure, slave requesting old segments
Date
Msg-id e89b76f9-f60a-a645-587f-00aeb3c68770@aklaver.com
Whole thread Raw
In response to Re: Replication failure, slave requesting old segments  ("Phil Endecott" <spam_from_pgsql_lists@chezphil.org>)
Responses Re: Replication failure, slave requesting old segments
List pgsql-general
On 08/11/2018 12:42 PM, Phil Endecott wrote:
> Hi Adrian,
> 
> Adrian Klaver wrote:
>> Looks like the master recycled the WAL's while the slave could not 
>> connect.
> 
> Yes but... why is that a problem?  The master is copying the WALs to
> the backup server using scp, where they remain forever.  The slave gets

To me it looks like that did not happen:

2018-08-11 00:05:50.364 UTC [615] LOG:  restored log file 
"0000000100000007000000D0" from archive
scp: backup/postgresql/archivedir/0000000100000007000000D1: No such file 
or directory
2018-08-11 00:05:51.325 UTC [7208] LOG:  started streaming WAL from 
primary at 7/D0000000 on timeline 1
2018-08-11 00:05:51.325 UTC [7208] FATAL:  could not receive data from 
WAL stream: ERROR:  requested WAL segment 0000000100000007000000D0 has 
already been removed

Above 0000000100000007000000D0 is gone/recycled on the master and the 
archived version does not seem to be complete as the streaming 
replication is trying to find it.


Below you kick the master and it coughs up the files to the archive 
including *D0 and *D1 on up to *D4 and then the streaming picks using *D5.

2018-08-11 00:55:49.741 UTC [7954] LOG:  restored log file 
"0000000100000007000000D0" from archive
2018-08-11 00:56:12.304 UTC [7954] LOG:  restored log file 
"0000000100000007000000D1" from archive
2018-08-11 00:56:35.481 UTC [7954] LOG:  restored log file 
"0000000100000007000000D2" from archive
2018-08-11 00:56:57.443 UTC [7954] LOG:  restored log file 
"0000000100000007000000D3" from archive
2018-08-11 00:57:21.723 UTC [7954] LOG:  restored log file 
"0000000100000007000000D4" from archive
scp: backup/postgresql/archivedir/0000000100000007000000D5: No such file 
or directory
2018-08-11 00:57:22.915 UTC [7954] LOG:  unexpected pageaddr 7/C7000000 
in log segment 00000001000000070000
00D5, offset 0
2018-08-11 00:57:23.114 UTC [12348] LOG:  started streaming WAL from 
primary at 7/D5000000 on timeline 1


Best guess is the archiving did not work as expected during:

"(During this time the master was also down for a shorter period.)"

> them from there before it starts streaming.  So it shouldn't matter
> if the master recycles them, as the slave should be able to get everything
> using the combination of scp and then streaming.
> 
> Am I missing something about how this sort of replication is supposed to
> work?
> 
> 
> Thanks, Phil.
> 
> 
> 
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com


pgsql-general by date:

Previous
From: "Phil Endecott"
Date:
Subject: Re: Replication failure, slave requesting old segments
Next
From: Stephen Frost
Date:
Subject: Re: Replication failure, slave requesting old segments