Re: Failing start-up archive recovery at Standby mode in PG9.2.4 - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Failing start-up archive recovery at Standby mode in PG9.2.4
Date
Msg-id 51890459.9030905@vmware.com
Whole thread Raw
In response to Re: Failing start-up archive recovery at Standby mode in PG9.2.4  (KONDO Mitsumasa <kondo.mitsumasa@lab.ntt.co.jp>)
Responses Re: Failing start-up archive recovery at Standby mode in PG9.2.4  (KONDO Mitsumasa <kondo.mitsumasa@lab.ntt.co.jp>)
List pgsql-hackers
On 26.04.2013 11:51, KONDO Mitsumasa wrote:
> Hi,
>
> I discavered the problem cause. I think taht horiguchi's discovery is
> another problem...
> Problem has CreateRestartPoint. In recovery mode, PG should not WAL record.
> Because PG does not know latest WAL file location.
> But in this problem case, PG(standby) write WAL file at RestartPoint in
> archive recovery.
> In recovery mode, I think that RestartPoint can write only
> MinRecoveryPoint.
>
> Here is Standby's pg_xlog directory in problem caused.
>> [mitsu-ko@localhost postgresql-9.2.4-c]$ ls Standby/pg_xlog/
>> 000000020000000000000003 000000020000000000000007
>> 00000002000000000000000B 00000003.history
>> 000000020000000000000004 000000020000000000000008
>> 00000002000000000000000C 00000003000000000000000E
>> 000000020000000000000005 000000020000000000000009
>> 00000002000000000000000D 00000003000000000000000F
>> 000000020000000000000006 00000002000000000000000A
>> 00000002000000000000000E archive_status
>
> This problem case is here.
>> [Standby] 2013-04-26 04:26:44 EDT DEBUG: 00000: attempting to remove
>> WAL segments older than log file 000000030000000000000002
>> [Standby] 2013-04-26 04:26:44 EDT LOCATION: RemoveOldXlogFiles,
>> xlog.c:3568
>> [Standby] 2013-04-26 04:26:44 EDT DEBUG: 00000: recycled transaction
>> log file "000000010000000000000002"
>> [Standby] 2013-04-26 04:26:44 EDT LOCATION: RemoveOldXlogFiles,
>> xlog.c:3607
>> [Standby] 2013-04-26 04:26:44 EDT DEBUG: 00000: recycled transaction
>> log file "000000020000000000000002"
>> [Standby] 2013-04-26 04:26:44 EDT LOCATION: RemoveOldXlogFiles,
>> xlog.c:3607
>> [Standby] 2013-04-26 04:26:44 EDT LOG: 00000: restartpoint complete:
>> wrote 9 buffers (0.2%); 0 transaction log file(s) added, 0 removed, 2
>> recycled; write=0.601 s, sync=1.178 s, total=1.781 s; sync files=3,
>> longest=1.176 s, average=0.392 s
>> [Standby] 2013-04-26 04:26:44 EDT LOCATION: LogCheckpointEnd, xlog.c:7893
>> [Standby] 2013-04-26 04:26:44 EDT LOG: 00000: recovery restart point
>> at 0/90FE448
>> [Standby] 2013-04-26 04:26:44 EDT DETAIL: last completed transaction
>> was at log time 2013-04-26 04:25:53.203725-04
>> [Standby] 2013-04-26 04:26:44 EDT LOCATION: CreateRestartPoint,
>> xlog.c:8601
>> [Standby] 2013-04-26 04:26:44 EDT LOG: 00000: restartpoint starting: xlog
>> [Standby] 2013-04-26 04:26:44 EDT LOCATION: LogCheckpointStart,
>> xlog.c:7821
>> cp: cannot stat `../arc/00000003000000000000000F': そのようなファイル
>> やディレクトリはありません
>> [Standby] 2013-04-26 04:26:44 EDT DEBUG: 00000: could not restore file
>> "00000003000000000000000F" from archive: return code 256
>> [Standby] 2013-04-26 04:26:44 EDT LOCATION: RestoreArchivedFile,
>> xlog.c:3323
>> [Standby] 2013-04-26 04:26:44 EDT LOG: 00000: unexpected pageaddr
>> 0/2000000 in log file 0, segment 15, offset 0
>> [Standby] 2013-04-26 04:26:44 EDT LOCATION: ValidXLOGHeader, xlog.c:4395
>> cp: cannot stat `../arc/00000003000000000000000F': そのようなファイル
>> やディレクトリはありません
>> [Standby] 2013-04-26 04:26:44 EDT DEBUG: 00000: could not restore file
>> "00000003000000000000000F" from archive: return code 256
>
> In recovery, pg normary search WAL file at archive recovery.
> If propery WAL file is nothing(archive command is failed), next search
> pg_xlog directory.
> Normary, propety WAL file is nothing in pg_xlog.
> But this case has propety name's WAL file(But it's mistaken and illegal)
> in pg_xlog.
> So recovery is failed and broken Standby.
>
> So I fix CreateRestartPoint at branching point of executing
> MinRecoveryPoint.
> It seems to fix this problem, but attached patch is plain.

I didn't understand this. I committed a fix for the issue where recycled
WAL files get in the way of recovery, but I'm not sure if you're
describing the same issue or something else. But before we dig any
deeper into this, can you verify if you're still seeing any issues on
9.3beta1?

- Heikki



pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: pg_dump --snapshot
Next
From: Heikki Linnakangas
Date:
Subject: Re: [BUGS] BUG #8043: 9.2.4 doesn't open WAL files from archive, only looks in pg_xlog