Re: Strange replication problem - segment restored from archive but still requested from master - Mailing list pgsql-general

From Venkata Balaji N
Subject Re: Strange replication problem - segment restored from archive but still requested from master
Date
Msg-id CAEyp7J-Q1p464XE=u4400TGG62N+Gfh6p-gPwA_a_rCuUZe0+Q@mail.gmail.com
Whole thread Raw
In response to Strange replication problem - segment restored from archive but still requested from master  (Piotr Gasidło <quaker@barbara.eu.org>)
Responses Re: Strange replication problem - segment restored from archive but still requested from master
List pgsql-general

On Fri, May 22, 2015 at 6:52 AM, Piotr Gasidło <quaker@barbara.eu.org> wrote:
Got strange problem. Unable to repeat, but got logs.

Simple master-slave using streaming replication.
Master is running. Slave is down.
Segment 0000000400004C4D00000090 was successfully archived and send
from master to slave.

Now I've started slave, and:

ay 21 21:23:37 d8 postgres[50645]: [3-1] 2015-05-21 21:23:37.033 CEST
@ 50645   LOG:  database system was shut down in recovery at
2015-05-21 21:22:03 CEST
May 21 21:23:37 d8 postgres[50645]: [4-1] 2015-05-21 21:23:37.034 CEST
@ 50645   LOG:  entering standby mode
May 21 21:23:37 d8 postgres[50645]: [5-1] 2015-05-21 21:23:37.058 CEST
@ 50645   LOG:  restored log file "0000000400004C4D00000088" from
archive
May 21 21:23:37 d8 postgres[50645]: [6-1] 2015-05-21 21:23:37.120 CEST
@ 50645   LOG:  redo starts at 4C4D/88493B50
May 21 21:23:37 d8 postgres[50645]: [7-1] 2015-05-21 21:23:37.226 CEST
@ 50645   LOG:  restored log file "0000000400004C4D00000089" from
archive
May 21 21:23:37 d8 postgres[50645]: [8-1] 2015-05-21 21:23:37.426 CEST
@ 50645   LOG:  restored log file "0000000400004C4D0000008A" from
archive
May 21 21:23:37 d8 postgres[50645]: [9-1] 2015-05-21 21:23:37.750 CEST
@ 50645   LOG:  restored log file "0000000400004C4D0000008B" from
archive
May 21 21:23:38 d8 postgres[50645]: [10-1] 2015-05-21 21:23:38.376
CEST @ 50645   LOG:  restored log file "0000000400004C4D0000008C" from
archive
May 21 21:23:38 d8 postgres[50645]: [11-1] 2015-05-21 21:23:38.690
CEST @ 50645   LOG:  restored log file "0000000400004C4D0000008D" from
archive
May 21 21:23:38 d8 postgres[50645]: [12-1] 2015-05-21 21:23:38.855
CEST @ 50645   LOG:  restored log file "0000000400004C4D0000008E" from
archive
May 21 21:23:39 d8 postgres[50645]: [13-1] 2015-05-21 21:23:39.275
CEST @ 50645   LOG:  restored log file "0000000400004C4D0000008F" from
archive
May 21 21:23:39 d8 postgres[50645]: [14-1] 2015-05-21 21:23:39.654
CEST @ 50645   LOG:  restored log file "0000000400004C4D00000090" from
archive
May 21 21:23:40 d8 postgres[50645]: [15-1] 2015-05-21 21:23:40.222
CEST @ 50645   LOG:  consistent recovery state reached at
4C4D/90FFF9C8
May 21 21:23:40 d8 postgres[50644]: [3-1] 2015-05-21 21:23:40.222 CEST
@ 50644   LOG:  database system is ready to accept read only
connections
May 21 21:23:40 d8 postgres[50645]: [16-1] 2015-05-21 21:23:40.223
CEST @ 50645   LOG:  unexpected pageaddr 4C46/E000000 in log segment
0000000400004C4D00000091, offset 0
May 21 21:23:40 d8 postgres[50699]: [4-1] 2015-05-21 21:23:40.232 CEST
@ 50699   LOG:  started streaming WAL from primary at 4C4D/90000000 on
timeline 4
May 21 21:23:40 d8 postgres[50699]: [5-1] 2015-05-21 21:23:40.232 CEST
@ 50699   FATAL:  could not receive data from WAL stream: ERROR:
requested WAL segment 0000000400004C4D00000090 has already been
removed
May 21 21:23:40 d8 postgres[50645]: [17-1] 2015-05-21 21:23:40.255
CEST @ 50645   LOG:  restored log file "0000000400004C4D00000090" from
archive
May 21 21:23:40 d8 postgres[50703]: [4-1] 2015-05-21 21:23:40.268 CEST
@ 50703   LOG:  started streaming WAL from primary at 4C4D/90000000 on
timeline 4
May 21 21:23:40 d8 postgres[50703]: [5-1] 2015-05-21 21:23:40.268 CEST
@ 50703   FATAL:  could not receive data from WAL stream: ERROR:
requested WAL segment 0000000400004C4D00000090 has already been
removed
May 21 21:23:40 d8 postgres[50703]: [5-2]
...
(and so on)

So, as I understand:
- slave was started and entered restore,
- slave restored 0000000400004C4D00000090 from archive, reached
consistent recovery state
- now, it connected to master and noticed, that, it has
0000000400004C4D00000091 segment uncomplete
- and then, started yelling about missing segment (master deleted it
already after archiving) on master

Why?
 
It might be yelling about the WAL segment due to the delay in shipping it from master to slave.
Do you have the restore_command set up in the recovery.conf file ? do you have any automated job which is shipping WAL archives from master to slave.

Slave - if it does not find the WAL segment in master pg_xlog location, it looks for the same in the location mentioned in the restore_command. It yells if it is unable to find it.

Cannot give any concrete comments/suggestions, until recovery.conf entries are known.
 
I fixed it by stopping slave, doing some operations on master, copying
0000000400004C4D00000091 and 0000000400004C4D00000092 from master, and
staring the slave:


May 21 21:24:56 d8 postgres[50644]: [4-1] 2015-05-21 21:24:56.160 CEST
@ 50644   LOG:  received fast shutdown request
May 21 21:24:56 d8 postgres[50644]: [5-1] 2015-05-21 21:24:56.160 CEST
@ 50644   LOG:  aborting any active transactions
May 21 21:24:56 d8 postgres[50657]: [3-1] 2015-05-21 21:24:56.162 CEST
@ 50657   LOG:  shutting down
May 21 21:24:56 d8 postgres[50657]: [4-1] 2015-05-21 21:24:56.174 CEST
@ 50657   LOG:  database system is shut down
May 21 21:28:54 d8 postgres[53574]: [1-1] 2015-05-21 21:28:54.781 CEST
@ 53574   LOG:  loaded library "pg_stat_statements"
May 21 21:28:55 d8 postgres[53574]: [2-1] 2015-05-21 21:28:55.123 CEST
@ 53574   LOG:  ending log output to stderr
May 21 21:28:55 d8 postgres[53574]: [2-2] 2015-05-21 21:28:55.123 CEST
@ 53574   HINT:  Future log output will go to log destination
"syslog".
May 21 21:28:55 d8 postgres[53575]: [3-1] 2015-05-21 21:28:55.124 CEST
@ 53575   LOG:  database system was shut down in recovery at
2015-05-21 21:24:56 CEST
May 21 21:28:55 d8 postgres[53575]: [4-1] 2015-05-21 21:28:55.126 CEST
@ 53575   LOG:  entering standby mode
May 21 21:28:55 d8 postgres[53575]: [5-1] 2015-05-21 21:28:55.150 CEST
@ 53575   LOG:  restored log file "0000000400004C4D00000088" from
archive
May 21 21:28:55 d8 postgres[53575]: [6-1] 2015-05-21 21:28:55.209 CEST
@ 53575   LOG:  redo starts at 4C4D/88493B50
May 21 21:28:55 d8 postgres[53575]: [7-1] 2015-05-21 21:28:55.314 CEST
@ 53575   LOG:  restored log file "0000000400004C4D00000089" from
archive
May 21 21:28:55 d8 postgres[53575]: [8-1] 2015-05-21 21:28:55.512 CEST
@ 53575   LOG:  restored log file "0000000400004C4D0000008A" from
archive
May 21 21:28:55 d8 postgres[53575]: [9-1] 2015-05-21 21:28:55.831 CEST
@ 53575   LOG:  restored log file "0000000400004C4D0000008B" from
archive
May 21 21:28:56 d8 postgres[53575]: [10-1] 2015-05-21 21:28:56.508
CEST @ 53575   LOG:  restored log file "0000000400004C4D0000008C" from
archive
May 21 21:28:56 d8 postgres[53575]: [11-1] 2015-05-21 21:28:56.818
CEST @ 53575   LOG:  restored log file "0000000400004C4D0000008D" from
archive
May 21 21:28:56 d8 postgres[53575]: [12-1] 2015-05-21 21:28:56.982
CEST @ 53575   LOG:  restored log file "0000000400004C4D0000008E" from
archive
May 21 21:28:57 d8 postgres[53575]: [13-1] 2015-05-21 21:28:57.415
CEST @ 53575   LOG:  restored log file "0000000400004C4D0000008F" from
archive
May 21 21:28:57 d8 postgres[53575]: [14-1] 2015-05-21 21:28:57.790
CEST @ 53575   LOG:  restored log file "0000000400004C4D00000090" from
archive
May 21 21:28:58 d8 postgres[53575]: [15-1] 2015-05-21 21:28:58.328
CEST @ 53575   LOG:  consistent recovery state reached at
4C4D/90FFF9C8
May 21 21:28:58 d8 postgres[53574]: [3-1] 2015-05-21 21:28:58.328 CEST
@ 53574   LOG:  database system is ready to accept read only
connections
May 21 21:28:58 d8 postgres[53575]: [16-1] 2015-05-21 21:28:58.349
CEST @ 53575   LOG:  restored log file "0000000400004C4D00000091" from
archive
May 21 21:28:59 d8 postgres[53575]: [17-1] 2015-05-21 21:28:59.962
CEST @ 53575   LOG:  restored log file "0000000400004C4D00000092" from
archive
May 21 21:29:00 d8 postgres[53575]: [18-1] 2015-05-21 21:29:00.037
CEST @ 53575   LOG:  unexpected pageaddr 4C45/DC000000 in log segment
0000000400004C4D00000093, offset 0
May 21 21:29:00 d8 postgres[53640]: [4-1] 2015-05-21 21:29:00.047 CEST
@ 53640   LOG:  started streaming WAL from primary at 4C4D/92000000 on
timeline 4

What was wrong?

This error message can be ignored, did you try shipping 0000000400004C4D00000093 to slave from master's archive location and see if slave picked it up and applied ?

Regards,
Venkata B N

Fujitsu Australia

pgsql-general by date:

Previous
From: Ian Barwick
Date:
Subject: Re: Enum in foreign table: error and correct way to handle.
Next
From: Adrian Klaver
Date:
Subject: Re: date with month and year