PITRTools: Bring up warm standby -> unexpected pageaddr - Mailing list pgsql-general

From Hannes Dorbath
Subject PITRTools: Bring up warm standby -> unexpected pageaddr
Date
Msg-id gqi7mi$1cs2$1@news.hub.org
Whole thread Raw
Responses Re: PITRTools: Bring up warm standby -> unexpected pageaddr
List pgsql-general
Hi,

I'm trying to setup a warm standby using PITRTools. Environment is
CentOS 5.2 x86_64 and PG 8.3.6 from yum.pgsqlrpms.org.

As far as I can tell everything seems to work as expected. After a
successful base backup and issuing cmd_standby -S the standby goes into
constant recovery mode and happily consumes WAL segments shipped form
the master. The corresponding log looks like this:

2009-03-26 17:55:04 CET::@:[2901]: LOG:  database system was
interrupted; last known up at 2009-03-26 17:09:49 CET
2009-03-26 17:55:04 CET::@:[2901]: LOG:  starting archive recovery
2009-03-26 17:55:04 CET::@:[2901]: LOG:  restore_command =
'/usr/bin/pg_standby  /data/pgsql_archive -s5 -w0 -c %f %p %r'
cp: cannot stat `/data/pgsql_archive/00000001.history': No such file or
directory
cp: cannot stat `/data/pgsql_archive/00000001.history': No such file or
directory
cp: cannot stat `/data/pgsql_archive/00000001.history': No such file or
directory
2009-03-26 17:55:20 CET::@:[2901]: LOG:  restored log file
"000000010000004E000000DF" from archive
2009-03-26 17:55:20 CET::@:[2901]: LOG:  automatic recovery in progress
2009-03-26 17:55:20 CET::@:[2901]: LOG:  redo starts at 4E/DF0001B8
2009-03-26 17:55:20 CET::@:[2901]: LOG:  restored log file
"000000010000004E000000E0" from archive
2009-03-26 17:55:20 CET::@:[2901]: LOG:  restored log file
"000000010000004E000000E1" from archive
2009-03-26 17:55:21 CET::@:[2901]: LOG:  restored log file
"000000010000004E000000E2" from archive
2009-03-26 17:55:21 CET::@:[2901]: LOG:  restored log file
"000000010000004E000000E3" from archive
2009-03-26 17:55:21 CET::@:[2901]: LOG:  restored log file
"000000010000004E000000E4" from archive
2009-03-26 17:55:22 CET::@:[2901]: LOG:  restored log file
"000000010000004E000000E5" from archive
2009-03-26 17:55:22 CET::@:[2901]: LOG:  restored log file
"000000010000004E000000E6" from archive
2009-03-26 17:55:22 CET::@:[2901]: LOG:  restored log file
"000000010000004E000000E7" from archive
2009-03-26 17:55:22 CET::@:[2901]: LOG:  restored log file
"000000010000004E000000E8" from archive
2009-03-26 17:55:22 CET::@:[2901]: LOG:  restored log file
"000000010000004E000000E9" from archive
2009-03-26 17:55:22 CET::@:[2901]: LOG:  restored log file
"000000010000004E000000EA" from archive
2009-03-26 17:56:37 CET::@:[2901]: LOG:  restored log file
"000000010000004E000000EB" from archive

As soon as I issue cmd_standby -F999 I get the following log entry:

2009-03-26 17:59:37 CET::@:[2874]: LOG:  received fast shutdown request
2009-03-26 17:59:37 CET::@:[2901]: FATAL:  could not restore file
"000000010000004E000000EC" from archive: return code 15
2009-03-26 17:59:37 CET::@:[2874]: LOG:  startup process (PID 2901)
exited with exit code 1
2009-03-26 17:59:37 CET::@:[2874]: LOG:  aborting startup due to startup
process failure

What I'm worried about now is that it reapplies all WAL segments it has
already consumed in warm standby mode again and always ends with
"unexpected pageaddr".

2009-03-26 17:59:39 CET::@:[5721]: LOG:  database system was interrupted
while in recovery at log time 2009-03-26 17:09:49 CET
2009-03-26 17:59:39 CET::@:[5721]: HINT:  If this has occurred more than
once some data might be corrupted and you might need to choose an
earlier recovery target.
2009-03-26 17:59:39 CET::@:[5721]: LOG:  starting archive recovery
2009-03-26 17:59:39 CET::@:[5721]: LOG:  restore_command = 'cp
/data/pgsql_archive/%f "%p"'
cp: cannot stat `/data/pgsql_archive/00000001.history': No such file or
directory
2009-03-26 17:59:39 CET::@:[5721]: LOG:  restored log file
"000000010000004E000000DF" from archive
2009-03-26 17:59:39 CET::@:[5721]: LOG:  automatic recovery in progress
2009-03-26 17:59:39 CET::@:[5721]: LOG:  redo starts at 4E/DF0001B8
2009-03-26 17:59:39 CET::@:[5721]: LOG:  restored log file
"000000010000004E000000E0" from archive
2009-03-26 17:59:39 CET::@:[5721]: LOG:  restored log file
"000000010000004E000000E1" from archive
2009-03-26 17:59:39 CET::@:[5721]: LOG:  restored log file
"000000010000004E000000E2" from archive
2009-03-26 17:59:39 CET::@:[5721]: LOG:  restored log file
"000000010000004E000000E3" from archive
2009-03-26 17:59:40 CET::@:[5721]: LOG:  restored log file
"000000010000004E000000E4" from archive
2009-03-26 17:59:40 CET::@:[5721]: LOG:  restored log file
"000000010000004E000000E5" from archive
2009-03-26 17:59:40 CET::@:[5721]: LOG:  restored log file
"000000010000004E000000E6" from archive
2009-03-26 17:59:40 CET::@:[5721]: LOG:  restored log file
"000000010000004E000000E7" from archive
2009-03-26 17:59:40 CET::@:[5721]: LOG:  restored log file
"000000010000004E000000E8" from archive
2009-03-26 17:59:40 CET::@:[5721]: LOG:  restored log file
"000000010000004E000000E9" from archive
2009-03-26 17:59:40 CET::@:[5721]: LOG:  restored log file
"000000010000004E000000EA" from archive
2009-03-26 17:59:40 CET::@:[5721]: LOG:  restored log file
"000000010000004E000000EB" from archive
cp: cannot stat `/data/pgsql_archive/000000010000004E000000EC': No such
file or directory
2009-03-26 17:59:40 CET::@:[5721]: LOG:  unexpected pageaddr 4C/61000000
in log file 78, segment 236, offset 0
2009-03-26 17:59:40 CET::@:[5721]: LOG:  redo done at 4E/EB000080
2009-03-26 17:59:41 CET::@:[5721]: LOG:  last completed transaction was
at log time 2009-03-26 17:54:31.209657+01
2009-03-26 17:59:41 CET::@:[5721]: LOG:  restored log file
"000000010000004E000000EB" from archive
cp: cannot stat `/data/pgsql_archive/00000002.history': No such file or
directory
2009-03-26 17:59:41 CET::@:[5721]: LOG:  selected new timeline ID: 2
cp: cannot stat `/data/pgsql_archive/00000001.history': No such file or
directory
2009-03-26 17:59:41 CET::@:[5721]: LOG:  archive recovery complete
2009-03-26 17:59:41 CET::@:[5721]: LOG:  checkpoint starting: shutdown
immediate
2009-03-26 17:59:41 CET::@:[5721]: LOG:  checkpoint complete: wrote 5515
buffers (1.6%); 0 transaction log file(s) added, 0 removed, 14 recycled;
write=0.272 s, sync=0.123 s, total=0.489 s
2009-03-26 17:59:41 CET::@:[5773]: LOG:  autovacuum launcher started
2009-03-26 17:59:41 CET::@:[5712]: LOG:  database system is ready to
accept connections

I tried all that multiple times from scratch and always got exactly the
same behaviour. Especially I have no idea why it is reapplying all WAL
segments again. One time that was over 400 segments. I feel I missed
some simple detail. Any hint is really appreciated.

pgsql-general by date:

Previous
From: "Marc Mamin"
Date:
Subject: Re: Enumerating a row set
Next
From: Scott Marlowe
Date:
Subject: Re: Is there a meaningful benchmark?