Thread: Re: [GENERAL] Strange replication problem - segment restored from archive but still requested from master
Re: [GENERAL] Strange replication problem - segment restored from archive but still requested from master
From
Fujii Masao
Date:
On Fri, May 22, 2015 at 5:52 AM, Piotr Gasidło <quaker@barbara.eu.org> wrote: > Got strange problem. Unable to repeat, but got logs. > > Simple master-slave using streaming replication. > Master is running. Slave is down. > Segment 0000000400004C4D00000090 was successfully archived and send > from master to slave. > > Now I've started slave, and: > > ay 21 21:23:37 d8 postgres[50645]: [3-1] 2015-05-21 21:23:37.033 CEST > @ 50645 LOG: database system was shut down in recovery at > 2015-05-21 21:22:03 CEST > May 21 21:23:37 d8 postgres[50645]: [4-1] 2015-05-21 21:23:37.034 CEST > @ 50645 LOG: entering standby mode > May 21 21:23:37 d8 postgres[50645]: [5-1] 2015-05-21 21:23:37.058 CEST > @ 50645 LOG: restored log file "0000000400004C4D00000088" from > archive > May 21 21:23:37 d8 postgres[50645]: [6-1] 2015-05-21 21:23:37.120 CEST > @ 50645 LOG: redo starts at 4C4D/88493B50 > May 21 21:23:37 d8 postgres[50645]: [7-1] 2015-05-21 21:23:37.226 CEST > @ 50645 LOG: restored log file "0000000400004C4D00000089" from > archive > May 21 21:23:37 d8 postgres[50645]: [8-1] 2015-05-21 21:23:37.426 CEST > @ 50645 LOG: restored log file "0000000400004C4D0000008A" from > archive > May 21 21:23:37 d8 postgres[50645]: [9-1] 2015-05-21 21:23:37.750 CEST > @ 50645 LOG: restored log file "0000000400004C4D0000008B" from > archive > May 21 21:23:38 d8 postgres[50645]: [10-1] 2015-05-21 21:23:38.376 > CEST @ 50645 LOG: restored log file "0000000400004C4D0000008C" from > archive > May 21 21:23:38 d8 postgres[50645]: [11-1] 2015-05-21 21:23:38.690 > CEST @ 50645 LOG: restored log file "0000000400004C4D0000008D" from > archive > May 21 21:23:38 d8 postgres[50645]: [12-1] 2015-05-21 21:23:38.855 > CEST @ 50645 LOG: restored log file "0000000400004C4D0000008E" from > archive > May 21 21:23:39 d8 postgres[50645]: [13-1] 2015-05-21 21:23:39.275 > CEST @ 50645 LOG: restored log file "0000000400004C4D0000008F" from > archive > May 21 21:23:39 d8 postgres[50645]: [14-1] 2015-05-21 21:23:39.654 > CEST @ 50645 LOG: restored log file "0000000400004C4D00000090" from > archive > May 21 21:23:40 d8 postgres[50645]: [15-1] 2015-05-21 21:23:40.222 > CEST @ 50645 LOG: consistent recovery state reached at > 4C4D/90FFF9C8 > May 21 21:23:40 d8 postgres[50644]: [3-1] 2015-05-21 21:23:40.222 CEST > @ 50644 LOG: database system is ready to accept read only > connections > May 21 21:23:40 d8 postgres[50645]: [16-1] 2015-05-21 21:23:40.223 > CEST @ 50645 LOG: unexpected pageaddr 4C46/E000000 in log segment > 0000000400004C4D00000091, offset 0 > May 21 21:23:40 d8 postgres[50699]: [4-1] 2015-05-21 21:23:40.232 CEST > @ 50699 LOG: started streaming WAL from primary at 4C4D/90000000 on > timeline 4 > May 21 21:23:40 d8 postgres[50699]: [5-1] 2015-05-21 21:23:40.232 CEST > @ 50699 FATAL: could not receive data from WAL stream: ERROR: > requested WAL segment 0000000400004C4D00000090 has already been > removed > May 21 21:23:40 d8 postgres[50645]: [17-1] 2015-05-21 21:23:40.255 > CEST @ 50645 LOG: restored log file "0000000400004C4D00000090" from > archive > May 21 21:23:40 d8 postgres[50703]: [4-1] 2015-05-21 21:23:40.268 CEST > @ 50703 LOG: started streaming WAL from primary at 4C4D/90000000 on > timeline 4 > May 21 21:23:40 d8 postgres[50703]: [5-1] 2015-05-21 21:23:40.268 CEST > @ 50703 FATAL: could not receive data from WAL stream: ERROR: > requested WAL segment 0000000400004C4D00000090 has already been > removed > May 21 21:23:40 d8 postgres[50703]: [5-2] > ... > (and so on) > > So, as I understand: > - slave was started and entered restore, > - slave restored 0000000400004C4D00000090 from archive, reached > consistent recovery state > - now, it connected to master and noticed, that, it has > 0000000400004C4D00000091 segment uncomplete > - and then, started yelling about missing segment (master deleted it > already after archiving) on master > > Why? Thanks for the report! This seems to be a bug. This problem happens when WAL record is stored in separate two WAL files and there is no valid latter WAL file in the standby. In your case, the former file is 0000000400004C4D00000090 and the latter is 0000000400004C4D00000091. In this case, the first half of WAL record can be read from the former WAL file, but the remaining half not because no valid latter file exists in the standby. Then the standby tries to retrieve the latter WAL file via replication. The problem here is that the standby tries to start the replication from the starting point of WAL record, i.e., that's the location of the former WAL file. So the already-read WAL file is requested via replication. To address this problem, maybe we can just use the location where the remaining half of WAL record starts from as the replication starting location. We would need to look at carefully the side effect of that change, though. Regards, -- Fujii Masao
Re: [GENERAL] Strange replication problem - segment restored from archive but still requested from master
From
Piotr Gasidło
Date:
2015-05-22 6:55 GMT+02:00 Fujii Masao <masao.fujii@gmail.com>: > Thanks for the report! This seems to be a bug. > > This problem happens when WAL record is stored in separate two WAL files and > there is no valid latter WAL file in the standby. In your case, the former file > is 0000000400004C4D00000090 and the latter is 0000000400004C4D00000091. I've got bot files, when I've pg_xdumplog-ed them: $ cat dump1 0000000400004C4D00000090.dump ... rmgr: Heap2 len (rec/tot): 24/ 7772, tx: 0, lsn: 4C4D/90FF6DB8, prev 4C4D/90FF51E8, bkp: 1000, desc: clean: rel 1663/131438/10079072; blk 13843 remxid 472083193 rmgr: Heap2 len (rec/tot): 504/ 536, tx: 0, lsn: 4C4D/90FF8C30, prev 4C4D/90FF6DB8, bkp: 0000, desc: freeze_page: rel 1663/131438/10079072; blk 13843; cutoff xid 422325322 ntuples 40 rmgr: Heap2 len (rec/tot): 24/ 5632, tx: 0, lsn: 4C4D/90FF8E48, prev 4C4D/90FF8C30, bkp: 1000, desc: freeze_page: rel 1663/131438/10079072; blk 13844; cutoff xid 422325322 ntuples 30 rmgr: Heap2 len (rec/tot): 24/ 7528, tx: 0, lsn: 4C4D/90FFA460, prev 4C4D/90FF8E48, bkp: 1000, desc: clean: rel 1663/131438/10079072; blk 13845 remxid 472083193 rmgr: Heap2 len (rec/tot): 564/ 596, tx: 0, lsn: 4C4D/90FFC1E0, prev 4C4D/90FFA460, bkp: 0000, desc: freeze_page: rel 1663/131438/10079072; blk 13845; cutoff xid 422325322 ntuples 45 rmgr: Heap2 len (rec/tot): 24/ 5316, tx: 0, lsn: 4C4D/90FFC438, prev 4C4D/90FFC1E0, bkp: 1000, desc: clean: rel 1663/131438/10079072; blk 13846 remxid 472083193 rmgr: Heap2 len (rec/tot): 420/ 452, tx: 0, lsn: 4C4D/90FFD900, prev 4C4D/90FFC438, bkp: 0000, desc: freeze_page: rel 1663/131438/10079072; blk 13846; cutoff xid 422325322 ntuples 33 rmgr: Heap2 len (rec/tot): 24/ 7856, tx: 0, lsn: 4C4D/90FFDAC8, prev 4C4D/90FFD900, bkp: 1000, desc: freeze_page: rel 1663/131438/10079072; blk 13847; cutoff xid 422325322 ntuples 53 rmgr: Heap2 len (rec/tot): 20/ 52, tx: 0, lsn: 4C4D/90FFF990, prev 4C4D/90FFDAC8, bkp: 0000, desc: visible: rel 1663/131438/10079072; blk 13847 0000000400004C4D00000091.dump rmgr: Heap2 len (rec/tot): 600/ 632, tx: 0, lsn: 4C4D/910014A0, prev 4C4D/90FFF9C8, bkp: 0000, desc: freeze_page: rel 1663/131438/10079072; blk 13848; cutoff xid 422325322 ntuples 48 rmgr: Heap2 len (rec/tot): 24/ 6640, tx: 0, lsn: 4C4D/91001718, prev 4C4D/910014A0, bkp: 1000, desc: freeze_page: rel 1663/131438/10079072; blk 13849; cutoff xid 422325322 ntuples 42 rmgr: Heap2 len (rec/tot): 24/ 7848, tx: 0, lsn: 4C4D/91003120, prev 4C4D/91001718, bkp: 1000, desc: freeze_page: rel 1663/131438/10079072; blk 13850; cutoff xid 422325322 ntuples 36 rmgr: Heap2 len (rec/tot): 20/ 52, tx: 0, lsn: 4C4D/91004FE0, prev 4C4D/91003120, bkp: 0000, desc: visible: rel 1663/131438/10079072; blk 13850 rmgr: Heap2 len (rec/tot): 24/ 6208, tx: 0, lsn: 4C4D/91005018, prev 4C4D/91004FE0, bkp: 1000, desc: clean: rel 1663/131438/10079072; blk 13851 remxid 472083193 rmgr: Heap2 len (rec/tot): 564/ 596, tx: 0, lsn: 4C4D/91006870, prev 4C4D/91005018, bkp: 0000, desc: freeze_page: rel 1663/131438/10079072; blk 13851; cutoff xid 422325322 ntuples 45 rmgr: Heap2 len (rec/tot): 24/ 7340, tx: 0, lsn: 4C4D/91006AC8, prev 4C4D/91006870, bkp: 1000, desc: freeze_page: rel 1663/131438/10079072; blk 13852; cutoff xid 422325322 ntuples 39 rmgr: Heap2 len (rec/tot): 24/ 7744, tx: 0, lsn: 4C4D/91008790, prev 4C4D/91006AC8, bkp: 1000, desc: clean: rel 1663/131438/10079072; blk 13853 remxid 472083193 rmgr: Heap2 len (rec/tot): 492/ 524, tx: 0, lsn: 4C4D/9100A5E8, prev 4C4D/91008790, bkp: 0000, desc: freeze_page: rel 1663/131438/10079072; blk 13853; cutoff xid 422325322 ntuples 39 rmgr: Heap2 len (rec/tot): 24/ 7152, tx: 0, lsn: 4C4D/9100A7F8, prev 4C4D/9100A5E8, bkp: 1000, desc: freeze_page: rel 1663/131438/10079072; blk 13854; cutoff xid 422325322 ntuples 32 rmgr: Heap2 len (rec/tot): 24/ 7380, tx: 0, lsn: 4C4D/9100C400, prev 4C4D/9100A7F8, bkp: 1000, desc: freeze_page: rel 1663/131438/10079072; blk 13855; cutoff xid 422325322 ntuples 42 rmgr: Heap2 len (rec/tot): 24/ 6784, tx: 0, lsn: 4C4D/9100E0F0, prev 4C4D/9100C400, bkp: 1000, desc: clean: rel 1663/131438/10079072; blk 13856 remxid 472083193 rmgr: Heap2 len (rec/tot): 408/ 440, tx: 0, lsn: 4C4D/9100FB70, prev 4C4D/9100E0F0, bkp: 0000, desc: freeze_page: rel 1663/131438/10079072; blk 13856; cutoff xid 422325322 ntuples 32 rmgr: Heap2 len (rec/tot): 24/ 7312, tx: 0, lsn: 4C4D/9100FD28, prev 4C4D/9100FB70, bkp: 1000, desc: clean: rel 1663/131438/10079072; blk 13857 remxid 472083193 rmgr: Heap2 len (rec/tot): 432/ 464, tx: 0, lsn: 4C4D/910119D0, prev 4C4D/9100FD28, bkp: 0000, desc: freeze_page: rel 1663/131438/10079072; blk 13857; cutoff xid 422325322 ntuples 34 rmgr: Heap2 len (rec/tot): 24/ 7920, tx: 0, lsn: 4C4D/91011BA0, prev 4C4D/910119D0, bkp: 1000, desc: freeze_page: rel 1663/131438/10079072; blk 13858; cutoff xid 422325322 ntuples 60 ... I this is not enought, I can attach both files. -- Piotr Gasidło