Re: Failing start-up archive recovery at Standby mode in PG9.2.4 - Mailing list pgsql-hackers

From KONDO Mitsumasa
Subject Re: Failing start-up archive recovery at Standby mode in PG9.2.4
Date
Msg-id 517A4012.9030502@lab.ntt.co.jp
Whole thread Raw
In response to Re: Failing start-up archive recovery at Standby mode in PG9.2.4  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses Re: Failing start-up archive recovery at Standby mode in PG9.2.4
List pgsql-hackers
Hi,

I discavered the problem cause. I think taht horiguchi's discovery is another problem...
Problem has CreateRestartPoint. In recovery mode, PG should not WAL record.
Because PG does not know latest WAL file location.
But in this problem case, PG(standby) write WAL file at RestartPoint in archive recovery.
In recovery mode, I think that RestartPoint can write only MinRecoveryPoint.

Here is Standby's pg_xlog directory in problem caused.
> [mitsu-ko@localhost postgresql-9.2.4-c]$ ls Standby/pg_xlog/
> 000000020000000000000003  000000020000000000000007  00000002000000000000000B  00000003.history
> 000000020000000000000004  000000020000000000000008  00000002000000000000000C  00000003000000000000000E
> 000000020000000000000005  000000020000000000000009  00000002000000000000000D  00000003000000000000000F
> 000000020000000000000006  00000002000000000000000A  00000002000000000000000E  archive_status

This problem case is here.
> [Standby] 2013-04-26 04:26:44 EDT DEBUG:  00000: attempting to remove WAL segments older than log file
000000030000000000000002
> [Standby] 2013-04-26 04:26:44 EDT LOCATION:  RemoveOldXlogFiles, xlog.c:3568
> [Standby] 2013-04-26 04:26:44 EDT DEBUG:  00000: recycled transaction log file "000000010000000000000002"
> [Standby] 2013-04-26 04:26:44 EDT LOCATION:  RemoveOldXlogFiles, xlog.c:3607
> [Standby] 2013-04-26 04:26:44 EDT DEBUG:  00000: recycled transaction log file "000000020000000000000002"
> [Standby] 2013-04-26 04:26:44 EDT LOCATION:  RemoveOldXlogFiles, xlog.c:3607
> [Standby] 2013-04-26 04:26:44 EDT LOG:  00000: restartpoint complete: wrote 9 buffers (0.2%); 0 transaction log
file(s)added, 0 removed, 2 recycled; write=0.601 s, sync=1.178 s, total=1.781 s; sync files=3, longest=1.176 s,
average=0.392s 
> [Standby] 2013-04-26 04:26:44 EDT LOCATION:  LogCheckpointEnd, xlog.c:7893
> [Standby] 2013-04-26 04:26:44 EDT LOG:  00000: recovery restart point at 0/90FE448
> [Standby] 2013-04-26 04:26:44 EDT DETAIL:  last completed transaction was at log time 2013-04-26 04:25:53.203725-04
> [Standby] 2013-04-26 04:26:44 EDT LOCATION:  CreateRestartPoint, xlog.c:8601
> [Standby] 2013-04-26 04:26:44 EDT LOG:  00000: restartpoint starting: xlog
> [Standby] 2013-04-26 04:26:44 EDT LOCATION:  LogCheckpointStart, xlog.c:7821
> cp: cannot stat `../arc/00000003000000000000000F': そのようなファイルやディレクトリはありません
> [Standby] 2013-04-26 04:26:44 EDT DEBUG:  00000: could not restore file "00000003000000000000000F" from archive:
returncode 256 
> [Standby] 2013-04-26 04:26:44 EDT LOCATION:  RestoreArchivedFile, xlog.c:3323
> [Standby] 2013-04-26 04:26:44 EDT LOG:  00000: unexpected pageaddr 0/2000000 in log file 0, segment 15, offset 0
> [Standby] 2013-04-26 04:26:44 EDT LOCATION:  ValidXLOGHeader, xlog.c:4395
> cp: cannot stat `../arc/00000003000000000000000F': そのようなファイルやディレクトリはありません
> [Standby] 2013-04-26 04:26:44 EDT DEBUG:  00000: could not restore file "00000003000000000000000F" from archive:
returncode 256 

In recovery, pg normary search WAL file at archive recovery.
If propery WAL file is nothing(archive command is failed), next search pg_xlog directory.
Normary, propety WAL file is nothing in pg_xlog.
But this case has propety name's WAL file(But it's mistaken and illegal) in pg_xlog.
So recovery is failed and broken Standby.

So I fix CreateRestartPoint at branching point of executing MinRecoveryPoint.
It seems to fix this problem, but attached patch is plain.


Best Regard,
--
NTT Open Source Software Center
Mitsumasa KONDO

Attachment

pgsql-hackers by date:

Previous
From: Ants Aasma
Date:
Subject: Re: Substituting Checksum Algorithm (was: Enabling Checksums)
Next
From: Gavin Flower
Date:
Subject: Re: pg_controldata gobbledygook