Strange replication problem - segment restored from archive but still requested from master - Mailing list pgsql-general

From Piotr Gasidło
Subject Strange replication problem - segment restored from archive but still requested from master
Date
Msg-id CAF8akQtSgY+K4Z24qmBbRjZBEN9q+C8rGenq36KqmUGRTBqLhQ@mail.gmail.com
Whole thread Raw
Responses Re: Strange replication problem - segment restored from archive but still requested from master
Re: Strange replication problem - segment restored from archive but still requested from master
List pgsql-general
Got strange problem. Unable to repeat, but got logs.

Simple master-slave using streaming replication.
Master is running. Slave is down.
Segment 0000000400004C4D00000090 was successfully archived and send
from master to slave.

Now I've started slave, and:

ay 21 21:23:37 d8 postgres[50645]: [3-1] 2015-05-21 21:23:37.033 CEST
@ 50645   LOG:  database system was shut down in recovery at
2015-05-21 21:22:03 CEST
May 21 21:23:37 d8 postgres[50645]: [4-1] 2015-05-21 21:23:37.034 CEST
@ 50645   LOG:  entering standby mode
May 21 21:23:37 d8 postgres[50645]: [5-1] 2015-05-21 21:23:37.058 CEST
@ 50645   LOG:  restored log file "0000000400004C4D00000088" from
archive
May 21 21:23:37 d8 postgres[50645]: [6-1] 2015-05-21 21:23:37.120 CEST
@ 50645   LOG:  redo starts at 4C4D/88493B50
May 21 21:23:37 d8 postgres[50645]: [7-1] 2015-05-21 21:23:37.226 CEST
@ 50645   LOG:  restored log file "0000000400004C4D00000089" from
archive
May 21 21:23:37 d8 postgres[50645]: [8-1] 2015-05-21 21:23:37.426 CEST
@ 50645   LOG:  restored log file "0000000400004C4D0000008A" from
archive
May 21 21:23:37 d8 postgres[50645]: [9-1] 2015-05-21 21:23:37.750 CEST
@ 50645   LOG:  restored log file "0000000400004C4D0000008B" from
archive
May 21 21:23:38 d8 postgres[50645]: [10-1] 2015-05-21 21:23:38.376
CEST @ 50645   LOG:  restored log file "0000000400004C4D0000008C" from
archive
May 21 21:23:38 d8 postgres[50645]: [11-1] 2015-05-21 21:23:38.690
CEST @ 50645   LOG:  restored log file "0000000400004C4D0000008D" from
archive
May 21 21:23:38 d8 postgres[50645]: [12-1] 2015-05-21 21:23:38.855
CEST @ 50645   LOG:  restored log file "0000000400004C4D0000008E" from
archive
May 21 21:23:39 d8 postgres[50645]: [13-1] 2015-05-21 21:23:39.275
CEST @ 50645   LOG:  restored log file "0000000400004C4D0000008F" from
archive
May 21 21:23:39 d8 postgres[50645]: [14-1] 2015-05-21 21:23:39.654
CEST @ 50645   LOG:  restored log file "0000000400004C4D00000090" from
archive
May 21 21:23:40 d8 postgres[50645]: [15-1] 2015-05-21 21:23:40.222
CEST @ 50645   LOG:  consistent recovery state reached at
4C4D/90FFF9C8
May 21 21:23:40 d8 postgres[50644]: [3-1] 2015-05-21 21:23:40.222 CEST
@ 50644   LOG:  database system is ready to accept read only
connections
May 21 21:23:40 d8 postgres[50645]: [16-1] 2015-05-21 21:23:40.223
CEST @ 50645   LOG:  unexpected pageaddr 4C46/E000000 in log segment
0000000400004C4D00000091, offset 0
May 21 21:23:40 d8 postgres[50699]: [4-1] 2015-05-21 21:23:40.232 CEST
@ 50699   LOG:  started streaming WAL from primary at 4C4D/90000000 on
timeline 4
May 21 21:23:40 d8 postgres[50699]: [5-1] 2015-05-21 21:23:40.232 CEST
@ 50699   FATAL:  could not receive data from WAL stream: ERROR:
requested WAL segment 0000000400004C4D00000090 has already been
removed
May 21 21:23:40 d8 postgres[50645]: [17-1] 2015-05-21 21:23:40.255
CEST @ 50645   LOG:  restored log file "0000000400004C4D00000090" from
archive
May 21 21:23:40 d8 postgres[50703]: [4-1] 2015-05-21 21:23:40.268 CEST
@ 50703   LOG:  started streaming WAL from primary at 4C4D/90000000 on
timeline 4
May 21 21:23:40 d8 postgres[50703]: [5-1] 2015-05-21 21:23:40.268 CEST
@ 50703   FATAL:  could not receive data from WAL stream: ERROR:
requested WAL segment 0000000400004C4D00000090 has already been
removed
May 21 21:23:40 d8 postgres[50703]: [5-2]
...
(and so on)

So, as I understand:
- slave was started and entered restore,
- slave restored 0000000400004C4D00000090 from archive, reached
consistent recovery state
- now, it connected to master and noticed, that, it has
0000000400004C4D00000091 segment uncomplete
- and then, started yelling about missing segment (master deleted it
already after archiving) on master

Why?

I fixed it by stopping slave, doing some operations on master, copying
0000000400004C4D00000091 and 0000000400004C4D00000092 from master, and
staring the slave:


May 21 21:24:56 d8 postgres[50644]: [4-1] 2015-05-21 21:24:56.160 CEST
@ 50644   LOG:  received fast shutdown request
May 21 21:24:56 d8 postgres[50644]: [5-1] 2015-05-21 21:24:56.160 CEST
@ 50644   LOG:  aborting any active transactions
May 21 21:24:56 d8 postgres[50657]: [3-1] 2015-05-21 21:24:56.162 CEST
@ 50657   LOG:  shutting down
May 21 21:24:56 d8 postgres[50657]: [4-1] 2015-05-21 21:24:56.174 CEST
@ 50657   LOG:  database system is shut down
May 21 21:28:54 d8 postgres[53574]: [1-1] 2015-05-21 21:28:54.781 CEST
@ 53574   LOG:  loaded library "pg_stat_statements"
May 21 21:28:55 d8 postgres[53574]: [2-1] 2015-05-21 21:28:55.123 CEST
@ 53574   LOG:  ending log output to stderr
May 21 21:28:55 d8 postgres[53574]: [2-2] 2015-05-21 21:28:55.123 CEST
@ 53574   HINT:  Future log output will go to log destination
"syslog".
May 21 21:28:55 d8 postgres[53575]: [3-1] 2015-05-21 21:28:55.124 CEST
@ 53575   LOG:  database system was shut down in recovery at
2015-05-21 21:24:56 CEST
May 21 21:28:55 d8 postgres[53575]: [4-1] 2015-05-21 21:28:55.126 CEST
@ 53575   LOG:  entering standby mode
May 21 21:28:55 d8 postgres[53575]: [5-1] 2015-05-21 21:28:55.150 CEST
@ 53575   LOG:  restored log file "0000000400004C4D00000088" from
archive
May 21 21:28:55 d8 postgres[53575]: [6-1] 2015-05-21 21:28:55.209 CEST
@ 53575   LOG:  redo starts at 4C4D/88493B50
May 21 21:28:55 d8 postgres[53575]: [7-1] 2015-05-21 21:28:55.314 CEST
@ 53575   LOG:  restored log file "0000000400004C4D00000089" from
archive
May 21 21:28:55 d8 postgres[53575]: [8-1] 2015-05-21 21:28:55.512 CEST
@ 53575   LOG:  restored log file "0000000400004C4D0000008A" from
archive
May 21 21:28:55 d8 postgres[53575]: [9-1] 2015-05-21 21:28:55.831 CEST
@ 53575   LOG:  restored log file "0000000400004C4D0000008B" from
archive
May 21 21:28:56 d8 postgres[53575]: [10-1] 2015-05-21 21:28:56.508
CEST @ 53575   LOG:  restored log file "0000000400004C4D0000008C" from
archive
May 21 21:28:56 d8 postgres[53575]: [11-1] 2015-05-21 21:28:56.818
CEST @ 53575   LOG:  restored log file "0000000400004C4D0000008D" from
archive
May 21 21:28:56 d8 postgres[53575]: [12-1] 2015-05-21 21:28:56.982
CEST @ 53575   LOG:  restored log file "0000000400004C4D0000008E" from
archive
May 21 21:28:57 d8 postgres[53575]: [13-1] 2015-05-21 21:28:57.415
CEST @ 53575   LOG:  restored log file "0000000400004C4D0000008F" from
archive
May 21 21:28:57 d8 postgres[53575]: [14-1] 2015-05-21 21:28:57.790
CEST @ 53575   LOG:  restored log file "0000000400004C4D00000090" from
archive
May 21 21:28:58 d8 postgres[53575]: [15-1] 2015-05-21 21:28:58.328
CEST @ 53575   LOG:  consistent recovery state reached at
4C4D/90FFF9C8
May 21 21:28:58 d8 postgres[53574]: [3-1] 2015-05-21 21:28:58.328 CEST
@ 53574   LOG:  database system is ready to accept read only
connections
May 21 21:28:58 d8 postgres[53575]: [16-1] 2015-05-21 21:28:58.349
CEST @ 53575   LOG:  restored log file "0000000400004C4D00000091" from
archive
May 21 21:28:59 d8 postgres[53575]: [17-1] 2015-05-21 21:28:59.962
CEST @ 53575   LOG:  restored log file "0000000400004C4D00000092" from
archive
May 21 21:29:00 d8 postgres[53575]: [18-1] 2015-05-21 21:29:00.037
CEST @ 53575   LOG:  unexpected pageaddr 4C45/DC000000 in log segment
0000000400004C4D00000093, offset 0
May 21 21:29:00 d8 postgres[53640]: [4-1] 2015-05-21 21:29:00.047 CEST
@ 53640   LOG:  started streaming WAL from primary at 4C4D/92000000 on
timeline 4

What was wrong?

Postgresql 9.3.6 on slave and 9.3.4 on master. Unable to retest if
master was also 9.3.6.



--
Piotr Gasidło


pgsql-general by date:

Previous
From: Melvin Davidson
Date:
Subject: Re: About COPY command (and probably file fdw too)
Next
From: Paul Jungwirth
Date:
Subject: Re: date with month and year