PITRTools: Bring up warm standby -> unexpected pageaddr - Mailing list pgsql-general
From | Hannes Dorbath |
---|---|
Subject | PITRTools: Bring up warm standby -> unexpected pageaddr |
Date | |
Msg-id | gqi7mi$1cs2$1@news.hub.org Whole thread Raw |
Responses |
Re: PITRTools: Bring up warm standby -> unexpected
pageaddr
|
List | pgsql-general |
Hi, I'm trying to setup a warm standby using PITRTools. Environment is CentOS 5.2 x86_64 and PG 8.3.6 from yum.pgsqlrpms.org. As far as I can tell everything seems to work as expected. After a successful base backup and issuing cmd_standby -S the standby goes into constant recovery mode and happily consumes WAL segments shipped form the master. The corresponding log looks like this: 2009-03-26 17:55:04 CET::@:[2901]: LOG: database system was interrupted; last known up at 2009-03-26 17:09:49 CET 2009-03-26 17:55:04 CET::@:[2901]: LOG: starting archive recovery 2009-03-26 17:55:04 CET::@:[2901]: LOG: restore_command = '/usr/bin/pg_standby /data/pgsql_archive -s5 -w0 -c %f %p %r' cp: cannot stat `/data/pgsql_archive/00000001.history': No such file or directory cp: cannot stat `/data/pgsql_archive/00000001.history': No such file or directory cp: cannot stat `/data/pgsql_archive/00000001.history': No such file or directory 2009-03-26 17:55:20 CET::@:[2901]: LOG: restored log file "000000010000004E000000DF" from archive 2009-03-26 17:55:20 CET::@:[2901]: LOG: automatic recovery in progress 2009-03-26 17:55:20 CET::@:[2901]: LOG: redo starts at 4E/DF0001B8 2009-03-26 17:55:20 CET::@:[2901]: LOG: restored log file "000000010000004E000000E0" from archive 2009-03-26 17:55:20 CET::@:[2901]: LOG: restored log file "000000010000004E000000E1" from archive 2009-03-26 17:55:21 CET::@:[2901]: LOG: restored log file "000000010000004E000000E2" from archive 2009-03-26 17:55:21 CET::@:[2901]: LOG: restored log file "000000010000004E000000E3" from archive 2009-03-26 17:55:21 CET::@:[2901]: LOG: restored log file "000000010000004E000000E4" from archive 2009-03-26 17:55:22 CET::@:[2901]: LOG: restored log file "000000010000004E000000E5" from archive 2009-03-26 17:55:22 CET::@:[2901]: LOG: restored log file "000000010000004E000000E6" from archive 2009-03-26 17:55:22 CET::@:[2901]: LOG: restored log file "000000010000004E000000E7" from archive 2009-03-26 17:55:22 CET::@:[2901]: LOG: restored log file "000000010000004E000000E8" from archive 2009-03-26 17:55:22 CET::@:[2901]: LOG: restored log file "000000010000004E000000E9" from archive 2009-03-26 17:55:22 CET::@:[2901]: LOG: restored log file "000000010000004E000000EA" from archive 2009-03-26 17:56:37 CET::@:[2901]: LOG: restored log file "000000010000004E000000EB" from archive As soon as I issue cmd_standby -F999 I get the following log entry: 2009-03-26 17:59:37 CET::@:[2874]: LOG: received fast shutdown request 2009-03-26 17:59:37 CET::@:[2901]: FATAL: could not restore file "000000010000004E000000EC" from archive: return code 15 2009-03-26 17:59:37 CET::@:[2874]: LOG: startup process (PID 2901) exited with exit code 1 2009-03-26 17:59:37 CET::@:[2874]: LOG: aborting startup due to startup process failure What I'm worried about now is that it reapplies all WAL segments it has already consumed in warm standby mode again and always ends with "unexpected pageaddr". 2009-03-26 17:59:39 CET::@:[5721]: LOG: database system was interrupted while in recovery at log time 2009-03-26 17:09:49 CET 2009-03-26 17:59:39 CET::@:[5721]: HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target. 2009-03-26 17:59:39 CET::@:[5721]: LOG: starting archive recovery 2009-03-26 17:59:39 CET::@:[5721]: LOG: restore_command = 'cp /data/pgsql_archive/%f "%p"' cp: cannot stat `/data/pgsql_archive/00000001.history': No such file or directory 2009-03-26 17:59:39 CET::@:[5721]: LOG: restored log file "000000010000004E000000DF" from archive 2009-03-26 17:59:39 CET::@:[5721]: LOG: automatic recovery in progress 2009-03-26 17:59:39 CET::@:[5721]: LOG: redo starts at 4E/DF0001B8 2009-03-26 17:59:39 CET::@:[5721]: LOG: restored log file "000000010000004E000000E0" from archive 2009-03-26 17:59:39 CET::@:[5721]: LOG: restored log file "000000010000004E000000E1" from archive 2009-03-26 17:59:39 CET::@:[5721]: LOG: restored log file "000000010000004E000000E2" from archive 2009-03-26 17:59:39 CET::@:[5721]: LOG: restored log file "000000010000004E000000E3" from archive 2009-03-26 17:59:40 CET::@:[5721]: LOG: restored log file "000000010000004E000000E4" from archive 2009-03-26 17:59:40 CET::@:[5721]: LOG: restored log file "000000010000004E000000E5" from archive 2009-03-26 17:59:40 CET::@:[5721]: LOG: restored log file "000000010000004E000000E6" from archive 2009-03-26 17:59:40 CET::@:[5721]: LOG: restored log file "000000010000004E000000E7" from archive 2009-03-26 17:59:40 CET::@:[5721]: LOG: restored log file "000000010000004E000000E8" from archive 2009-03-26 17:59:40 CET::@:[5721]: LOG: restored log file "000000010000004E000000E9" from archive 2009-03-26 17:59:40 CET::@:[5721]: LOG: restored log file "000000010000004E000000EA" from archive 2009-03-26 17:59:40 CET::@:[5721]: LOG: restored log file "000000010000004E000000EB" from archive cp: cannot stat `/data/pgsql_archive/000000010000004E000000EC': No such file or directory 2009-03-26 17:59:40 CET::@:[5721]: LOG: unexpected pageaddr 4C/61000000 in log file 78, segment 236, offset 0 2009-03-26 17:59:40 CET::@:[5721]: LOG: redo done at 4E/EB000080 2009-03-26 17:59:41 CET::@:[5721]: LOG: last completed transaction was at log time 2009-03-26 17:54:31.209657+01 2009-03-26 17:59:41 CET::@:[5721]: LOG: restored log file "000000010000004E000000EB" from archive cp: cannot stat `/data/pgsql_archive/00000002.history': No such file or directory 2009-03-26 17:59:41 CET::@:[5721]: LOG: selected new timeline ID: 2 cp: cannot stat `/data/pgsql_archive/00000001.history': No such file or directory 2009-03-26 17:59:41 CET::@:[5721]: LOG: archive recovery complete 2009-03-26 17:59:41 CET::@:[5721]: LOG: checkpoint starting: shutdown immediate 2009-03-26 17:59:41 CET::@:[5721]: LOG: checkpoint complete: wrote 5515 buffers (1.6%); 0 transaction log file(s) added, 0 removed, 14 recycled; write=0.272 s, sync=0.123 s, total=0.489 s 2009-03-26 17:59:41 CET::@:[5773]: LOG: autovacuum launcher started 2009-03-26 17:59:41 CET::@:[5712]: LOG: database system is ready to accept connections I tried all that multiple times from scratch and always got exactly the same behaviour. Especially I have no idea why it is reapplying all WAL segments again. One time that was over 400 segments. I feel I missed some simple detail. Any hint is really appreciated.
pgsql-general by date: