Re: bug of recovery? - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: bug of recovery?
Date
Msg-id CAHGQGwH2O=-1etYFzE10ftsOsJQEU_RbGcD8PuE4w5CSqkvMMg@mail.gmail.com
Whole thread Raw
In response to Re: bug of recovery?  (Florian Pflug <fgp@phlo.org>)
Responses Re: bug of recovery?
List pgsql-hackers
On Tue, Sep 27, 2011 at 7:28 AM, Florian Pflug <fgp@phlo.org> wrote:
> On Sep26, 2011, at 22:39 , Tom Lane wrote:
>> It might be worthwhile to invoke XLogCheckInvalidPages() as soon as
>> we (think we) have reached consistency, rather than leaving it to be
>> done only when we exit recovery mode.
>
> I believe we also need to prevent the creation of restart points before
> we've reached consistency. If we're starting from an online backup,
> and a checkpoint occurred between pg_start_backup() and pg_stop_backup(),
> we currently create a restart point upon replaying that checkpoint's
> xlog record. At that point, however, unresolved page references are
> not an error, since a truncation that happened after the checkpoint
> (but before pg_stop_backup()) might or might not be reflected in the
> online backup.

Preventing the creation of restartpoints before reaching consistent point
sounds fragile to the case where the backup takes very long time. It might
also take very long time to reach consistent point when replaying from that
backup. Which prevents also the removal of WAL files (e.g., streamed from
the master server) for a long time, and then might cause disk full failure.

ISTM that writing an invalid-page table to the disk for every restartpoints is
better approach. If an invalid-page table is never updated after we've
reached consistency point, we probably should make restartpoints write
that table only after that point. And, if a reference to an invalid
page is found
after the consistent point, we should emit error and cancel a recovery.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [BUGS] BUG #6218: TRAP: FailedAssertion( "!(owner->nsnapshots == 0)", File: "resowner.c", Line: 365)
Next
From: Andrew Dunstan
Date:
Subject: Re: unite recovery.conf and postgresql.conf