Re: Strange replication problem - segment restored from archive but still requested from master - Mailing list pgsql-general
From | Piotr Gasidło |
---|---|
Subject | Re: Strange replication problem - segment restored from archive but still requested from master |
Date | |
Msg-id | CAF8akQuesk5GZ37QNE=Up_WF8SG=ybard0H_yL98VGPwGKRExA@mail.gmail.com Whole thread Raw |
In response to | Re: Strange replication problem - segment restored from archive but still requested from master (Venkata Balaji N <nag1010@gmail.com>) |
List | pgsql-general |
2015-05-22 2:16 GMT+02:00 Venkata Balaji N <nag1010@gmail.com>: > > It might be yelling about the WAL segment due to the delay in shipping it > from master to slave. > Do you have the restore_command set up in the recovery.conf file ? do you > have any automated job which is shipping WAL archives from master to slave. Both yes. I've used omnipitr. Heavy logged. So this is on master: postgresql.conf: archive_mode = on archive_command = '/usr/local/pgsql/scripts/omnipitr/bin/omnipitr-archive --config-file /usr/local/pgsql/core/scripts/omnipitr/config/archive "%p"' /usr/local/pgsql/core/scripts/omnipitr/config/archive: --verbose --data-dir=/usr/local/pgsql/core/data/ --dst-remote=pgsql@d0.XXX:/usr/local/pgsql/core/wal/ --dst-remote=pgsql@d8.XXX:/usr/local/pgsql/core/wal/ --state-dir=/usr/local/pgsql/core/scripts/omnipitr/state --log=/usr/local/pgsql/core/scripts/omnipitr/log/omnipitr.log /usr/local/pgsql/core/scripts/omnipitr/log/omnipitr.log, part: 2015-05-21 20:35:32.404395 +0200 : 98206 : omnipitr-archive : LOG : Segment /usr/local/pgsql/core/data/pg_xlog/0000000400004C4D0000008E successfully sent to all destinations. 2015-05-21 20:35:59.993614 +0200 : 98214 : omnipitr-archive : LOG : Called with parameters: --config-file /usr/local/pgsql/core/scripts/omnipitr/config/archive pg_xlog/0000000400004C4D0000008F 2015-05-21 20:36:00.495839 +0200 : 98214 : omnipitr-archive : LOG : Sending /usr/local/pgsql/core/data/pg_xlog/0000000400004C4D0000008F to pgsql@d0.XXX:/usr/local/pgsql/core/wal/0000000400004C4D0000008F ended in 0.500619s 2015-05-21 20:36:00.896906 +0200 : 98214 : omnipitr-archive : LOG : Sending /usr/local/pgsql/core/data/pg_xlog/0000000400004C4D0000008F to pgsql@d8.XXX:/usr/local/pgsql/core/wal/0000000400004C4D0000008F ended in 0.396684s 2015-05-21 20:36:00.897467 +0200 : 98214 : omnipitr-archive : LOG : Timer [Segment delivery] took: 0.903s 2015-05-21 20:36:00.901777 +0200 : 98214 : omnipitr-archive : LOG : Segment /usr/local/pgsql/core/data/pg_xlog/0000000400004C4D0000008F successfully sent to all destinations. 2015-05-21 20:36:33.916879 +0200 : 98222 : omnipitr-archive : LOG : Called with parameters: --config-file /usr/local/pgsql/core/scripts/omnipitr/config/archive pg_xlog/0000000400004C4D00000090 2015-05-21 20:36:34.379965 +0200 : 98222 : omnipitr-archive : LOG : Sending /usr/local/pgsql/core/data/pg_xlog/0000000400004C4D00000090 to pgsql@d0.XXX:/usr/local/pgsql/core/wal/0000000400004C4D00000090 ended in 0.458563s 2015-05-21 20:36:34.931918 +0200 : 98222 : omnipitr-archive : LOG : Sending /usr/local/pgsql/core/data/pg_xlog/0000000400004C4D00000090 to pgsql@d8.XXX:/usr/local/pgsql/core/wal/0000000400004C4D00000090 ended in 0.550890s 2015-05-21 20:36:34.935241 +0200 : 98222 : omnipitr-archive : LOG : Timer [Segment delivery] took: 1.015s 2015-05-21 20:36:34.935776 +0200 : 98222 : omnipitr-archive : LOG : Segment /usr/local/pgsql/core/data/pg_xlog/0000000400004C4D00000090 successfully sent to all destinations. 2015-05-21 21:12:38.242687 +0200 : 561 : omnipitr-archive : LOG : Called with parameters: --config-file /usr/local/pgsql/core/scripts/omnipitr/config/archive pg_xlog/0000000400004C4D00000090 2015-05-21 21:12:38.425904 +0200 : 561 : omnipitr-archive : LOG : Sending /usr/local/pgsql/core/data/pg_xlog/0000000400004C4D00000090 to pgsql@d0.XXX:/usr/local/pgsql/core/wal/0000000400004C4D00000090 ended in 0.181528s 2015-05-21 21:12:38.969018 +0200 : 561 : omnipitr-archive : LOG : Sending /usr/local/pgsql/core/data/pg_xlog/0000000400004C4D00000090 to pgsql@d8.XXX:/usr/local/pgsql/core/wal/0000000400004C4D00000090 ended in 0.539258s 2015-05-21 21:12:38.971513 +0200 : 561 : omnipitr-archive : LOG : Timer [Segment delivery] took: 0.728s 2015-05-21 21:12:38.973229 +0200 : 561 : omnipitr-archive : LOG : Segment /usr/local/pgsql/core/data/pg_xlog/0000000400004C4D00000090 successfully sent to all destinations. 2015-05-21 21:28:51.389622 +0200 : 1634 : omnipitr-archive : LOG : Called with parameters: --config-file /usr/local/pgsql/core/scripts/omnipitr/config/archive pg_xlog/0000000400004C4D00000091 2015-05-21 21:28:52.457545 +0200 : 1634 : omnipitr-archive : LOG : Sending /usr/local/pgsql/core/data/pg_xlog/0000000400004C4D00000091 to pgsql@d0.XXX:/usr/local/pgsql/core/wal/0000000400004C4D00000091 ended in 1.065172s 2015-05-21 21:28:52.908541 +0200 : 1634 : omnipitr-archive : LOG : Sending /usr/local/pgsql/core/data/pg_xlog/0000000400004C4D00000091 to pgsql@d8.XXX:/usr/local/pgsql/core/wal/0000000400004C4D00000091 ended in 0.449934s 2015-05-21 21:28:52.910337 +0200 : 1634 : omnipitr-archive : LOG : Timer [Segment delivery] took: 1.519s 2015-05-21 21:28:52.910902 +0200 : 1634 : omnipitr-archive : LOG : Segment /usr/local/pgsql/core/data/pg_xlog/0000000400004C4D00000091 successfully sent to all destinations. 2015-05-21 21:28:53.028408 +0200 : 1639 : omnipitr-archive : LOG : Called with parameters: --config-file /usr/local/pgsql/core/scripts/omnipitr/config/archive pg_xlog/0000000400004C4D00000092 2015-05-21 21:28:53.783568 +0200 : 1639 : omnipitr-archive : LOG : Sending /usr/local/pgsql/core/data/pg_xlog/0000000400004C4D00000092 to pgsql@d0.XXX:/usr/local/pgsql/core/wal/0000000400004C4D00000092 ended in 0.753511s 2015-05-21 21:28:54.260492 +0200 : 1639 : omnipitr-archive : LOG : Sending /usr/local/pgsql/core/data/pg_xlog/0000000400004C4D00000092 to pgsql@d8.XXX:/usr/local/pgsql/core/wal/0000000400004C4D00000092 ended in 0.473037s 2015-05-21 21:28:54.261054 +0200 : 1639 : omnipitr-archive : LOG : Timer [Segment delivery] took: 1.232s 2015-05-21 21:28:54.262764 +0200 : 1639 : omnipitr-archive : LOG : Segment /usr/local/pgsql/core/data/pg_xlog/0000000400004C4D00000092 successfully sent to all destinations. 2015-05-21 21:30:06.034584 +0200 : 1870 : omnipitr-archive : LOG : Called with parameters: --config-file /usr/local/pgsql/core/scripts/omnipitr/config/archive pg_xlog/0000000400004C4D00000093 2015-05-21 21:30:06.920507 +0200 : 1870 : omnipitr-archive : LOG : Sending /usr/local/pgsql/core/data/pg_xlog/0000000400004C4D00000093 to pgsql@d0.XXX:/usr/local/pgsql/core/wal/0000000400004C4D00000093 ended in 0.883097s 2015-05-21 21:30:07.391485 +0200 : 1870 : omnipitr-archive : LOG : Sending /usr/local/pgsql/core/data/pg_xlog/0000000400004C4D00000093 to pgsql@d8.XXX:/usr/local/pgsql/core/wal/0000000400004C4D00000093 ended in 0.469941s 2015-05-21 21:30:07.394826 +0200 : 1870 : omnipitr-archive : LOG : Timer [Segment delivery] took: 1.358s But now see some unexpected problem. Shipping for 0000000400004C4D00000090 was runned twice. But probably it is caused by me - I've tried to put file again in master pg_xlog from d0.XXX and run CHECKPOINT on master. But this didn't worked out. I have WAL files: d0:/usr/local/pgsql/core/wal # ls -laT 0000000400004C4D00000090 0000000400004C4D00000091 0000000400004C4D00000092 0000000400004C4D00000093 -rw------- 1 pgsql pgsql 16777216 May 21 20:36:33 2015 0000000400004C4D00000090 -rw------- 1 pgsql pgsql 16777216 May 21 21:28:51 2015 0000000400004C4D00000091 -rw------- 1 pgsql pgsql 16777216 May 21 21:28:52 2015 0000000400004C4D00000092 -rw------- 1 pgsql pgsql 16777216 May 21 21:30:05 2015 0000000400004C4D00000093 > Slave - if it does not find the WAL segment in master pg_xlog location, it > looks for the same in the location mentioned in the restore_command. It > yells if it is unable to find it. > Cannot give any concrete comments/suggestions, until recovery.conf entries > are known. On slave, recovery.conf: In the time of writing it I switched from OmniPITR: restore_command = '/usr/local/pgsql/scripts/omnipitr/bin/omnipitr-restore --config-file /usr/local/pgsql/core/scripts/omnipitr/config/restore -sr "%f" "%p"' To: restore_command = 'cp /usr/local/pgsql/core/wal/%f %p' And manualy rsynced files from d0.XXX to /usr/local/pgsql/core/wal/ directory. >> What was wrong? > > > This error message can be ignored, did you try shipping > 0000000400004C4D00000093 to slave from master's archive location and see if > slave picked it up and applied ? As I wrote, I've only shipped up to 0000000400004C4D00000092. File 0000000400004C4D00000093 was archived after slave managed to start and successfuly start streaming WAL. But I've done this. When I've noticed that slave is OK - I stopped it, changed back to OmniPITR restore_command, and started again. So full log: -- here where slave sucessfuly start for the first time: May 21 21:28:55 d8 postgres[53574]: [2-1] 2015-05-21 21:28:55.123 CEST @ 53574 LOG: ending log output to stderr May 21 21:28:55 d8 postgres[53574]: [2-2] 2015-05-21 21:28:55.123 CEST @ 53574 HINT: Future log output will go to log destination "syslog". May 21 21:28:55 d8 postgres[53575]: [3-1] 2015-05-21 21:28:55.124 CEST @ 53575 LOG: database system was shut down in recovery at 2015-05-21 21:24:56 CEST May 21 21:28:55 d8 postgres[53575]: [4-1] 2015-05-21 21:28:55.126 CEST @ 53575 LOG: entering standby mode May 21 21:28:55 d8 postgres[53575]: [5-1] 2015-05-21 21:28:55.150 CEST @ 53575 LOG: restored log file "0000000400004C4D00000088" from archive May 21 21:28:55 d8 postgres[53575]: [6-1] 2015-05-21 21:28:55.209 CEST @ 53575 LOG: redo starts at 4C4D/88493B50 May 21 21:28:55 d8 postgres[53575]: [7-1] 2015-05-21 21:28:55.314 CEST @ 53575 LOG: restored log file "0000000400004C4D00000089" from archive May 21 21:28:55 d8 postgres[53575]: [8-1] 2015-05-21 21:28:55.512 CEST @ 53575 LOG: restored log file "0000000400004C4D0000008A" from archive May 21 21:28:55 d8 postgres[53585]: [3-1] 2015-05-21 21:28:55.781 CEST pgsql@postgres 53585 [local] [unknown] FATAL: the database system is starting up May 21 21:28:55 d8 postgres[53575]: [9-1] 2015-05-21 21:28:55.831 CEST @ 53575 LOG: restored log file "0000000400004C4D0000008B" from archive May 21 21:28:56 d8 postgres[53575]: [10-1] 2015-05-21 21:28:56.508 CEST @ 53575 LOG: restored log file "0000000400004C4D0000008C" from archive May 21 21:28:56 d8 postgres[53594]: [3-1] 2015-05-21 21:28:56.782 CEST pgsql@postgres 53594 [local] [unknown] FATAL: the database system is starting up May 21 21:28:56 d8 postgres[53575]: [11-1] 2015-05-21 21:28:56.818 CEST @ 53575 LOG: restored log file "0000000400004C4D0000008D" from archive May 21 21:28:56 d8 postgres[53575]: [12-1] 2015-05-21 21:28:56.982 CEST @ 53575 LOG: restored log file "0000000400004C4D0000008E" from archive May 21 21:28:57 d8 postgres[53575]: [13-1] 2015-05-21 21:28:57.415 CEST @ 53575 LOG: restored log file "0000000400004C4D0000008F" from archive May 21 21:28:57 d8 postgres[53607]: [3-1] 2015-05-21 21:28:57.784 CEST pgsql@postgres 53607 [local] [unknown] FATAL: the database system is starting up May 21 21:28:57 d8 postgres[53575]: [14-1] 2015-05-21 21:28:57.790 CEST @ 53575 LOG: restored log file "0000000400004C4D00000090" from archive May 21 21:28:58 d8 postgres[53575]: [15-1] 2015-05-21 21:28:58.328 CEST @ 53575 LOG: consistent recovery state reached at 4C4D/90FFF9C8 May 21 21:28:58 d8 postgres[53574]: [3-1] 2015-05-21 21:28:58.328 CEST @ 53574 LOG: database system is ready to accept read only connections May 21 21:28:58 d8 postgres[53575]: [16-1] 2015-05-21 21:28:58.349 CEST @ 53575 LOG: restored log file "0000000400004C4D00000091" from archive May 21 21:28:58 d8 postgres[53618]: [4-1] 2015-05-21 21:28:58.786 CEST pgsql@postgres 53618 [local] [unknown] FATAL: role "pgsql" does not exist May 21 21:28:59 d8 postgres[53575]: [17-1] 2015-05-21 21:28:59.962 CEST @ 53575 LOG: restored log file "0000000400004C4D00000092" from archive May 21 21:29:00 d8 postgres[53575]: [18-1] 2015-05-21 21:29:00.037 CEST @ 53575 LOG: unexpected pageaddr 4C45/DC000000 in log segment 0000000400004C4D00000093, offset 0 May 21 21:29:00 d8 postgres[53640]: [4-1] 2015-05-21 21:29:00.047 CEST @ 53640 LOG: started streaming WAL from primary at 4C4D/92000000 on timeline 4 -- I stopped him. May 21 21:30:13 d8 postgres[53574]: [4-1] 2015-05-21 21:30:13.417 CEST @ 53574 LOG: received fast shutdown request May 21 21:30:13 d8 postgres[53574]: [5-1] 2015-05-21 21:30:13.418 CEST @ 53574 LOG: aborting any active transactions May 21 21:30:13 d8 postgres[53640]: [5-1] 2015-05-21 21:30:13.441 CEST @ 53640 FATAL: terminating walreceiver process due to administrator command May 21 21:30:13 d8 postgres[53579]: [3-1] 2015-05-21 21:30:13.443 CEST @ 53579 LOG: shutting down May 21 21:30:13 d8 postgres[53579]: [4-1] 2015-05-21 21:30:13.443 CEST @ 53579 LOG: restartpoint starting: shutdown immediate May 21 21:30:15 d8 postgres[53579]: [5-1] 2015-05-21 21:30:15.094 CEST @ 53579 LOG: restartpoint complete: wrote 25212 buffers (0.8%); 0 transaction log file(s) added, 0 removed, 4 recycled; write=0.946 s, sync=0.629 s, total=1.651 s; sync files=66, longest=0.245 s, aver May 21 21:30:15 d8 postgres[53579]: [6-1] 2015-05-21 21:30:15.094 CEST @ 53579 LOG: recovery restart point at 4C4D/91A59348 May 21 21:30:15 d8 postgres[53579]: [6-2] 2015-05-21 21:30:15.094 CEST @ 53579 DETAIL: last completed transaction was at log time 2015-05-21 21:30:13.377052+02 May 21 21:30:15 d8 postgres[53579]: [7-1] 2015-05-21 21:30:15.390 CEST @ 53579 LOG: database system is shut down -- I've changed restore_command, master managed to ship 0000000400004C4D00000093 in background. -- Starting up. May 21 21:30:25 d8 postgres[54546]: [1-1] 2015-05-21 21:30:25.983 CEST @ 54546 LOG: loaded library "pg_stat_statements" May 21 21:30:26 d8 postgres[54546]: [2-1] 2015-05-21 21:30:26.329 CEST @ 54546 LOG: ending log output to stderr May 21 21:30:26 d8 postgres[54546]: [2-2] 2015-05-21 21:30:26.329 CEST @ 54546 HINT: Future log output will go to log destination "syslog". May 21 21:30:26 d8 postgres[54547]: [3-1] 2015-05-21 21:30:26.331 CEST @ 54547 LOG: database system was shut down in recovery at 2015-05-21 21:30:15 CEST May 21 21:30:26 d8 postgres[54547]: [4-1] 2015-05-21 21:30:26.626 CEST @ 54547 LOG: entering standby mode May 21 21:30:27 d8 postgres[54563]: [3-1] 2015-05-21 21:30:27.026 CEST pgsql@postgres 54563 [local] [unknown] FATAL: the database system is starting up May 21 21:30:27 d8 postgres[54547]: [5-1] 2015-05-21 21:30:27.068 CEST @ 54547 LOG: restored log file "0000000400004C4D00000091" from archive May 21 21:30:27 d8 postgres[54547]: [6-1] 2015-05-21 21:30:27.129 CEST @ 54547 LOG: redo starts at 4C4D/91A59348 May 21 21:30:27 d8 postgres[54547]: [7-1] 2015-05-21 21:30:27.301 CEST @ 54547 LOG: restored log file "0000000400004C4D00000092" from archive May 21 21:30:27 d8 postgres[54547]: [8-1] 2015-05-21 21:30:27.532 CEST @ 54547 LOG: restored log file "0000000400004C4D00000093" from archive May 21 21:30:27 d8 postgres[54547]: [9-1] 2015-05-21 21:30:27.957 CEST @ 54547 LOG: consistent recovery state reached at 4C4D/946CE620 May 21 21:30:27 d8 postgres[54547]: [10-1] 2015-05-21 21:30:27.957 CEST @ 54547 LOG: unexpected pageaddr 4C45/CD6D4000 in log segment 0000000400004C4D00000094, offset 7159808 May 21 21:30:27 d8 postgres[54546]: [3-1] 2015-05-21 21:30:27.957 CEST @ 54546 LOG: database system is ready to accept read only connections May 21 21:30:27 d8 postgres[54580]: [4-1] 2015-05-21 21:30:27.969 CEST @ 54580 LOG: started streaming WAL from primary at 4C4D/94000000 on timeline 4 May 21 21:30:28 d8 postgres[54581]: [4-1] 2015-05-21 21:30:28.039 CEST pgsql@postgres 54581 [local] [unknown] FATAL: role "pgsql" does not exist May 21 21:36:58 d8 postgres[54564]: [3-1] 2015-05-21 21:36:58.080 CEST @ 54564 LOG: restartpoint starting: time -- And everything goes OK. -- Piotr Gasidło
pgsql-general by date: