Re: bug of recovery? - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: bug of recovery?
Date
Msg-id CA+U5nMKDoseVh4usTz7B-N_XCFq-JHMFy2Q6=epy2udsS=e+yA@mail.gmail.com
Whole thread Raw
In response to Re: bug of recovery?  (Fujii Masao <masao.fujii@gmail.com>)
Responses Re: bug of recovery?
List pgsql-hackers
On Thu, Sep 29, 2011 at 12:31 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Tue, Sep 27, 2011 at 8:06 PM, Florian Pflug <fgp@phlo.org> wrote:
>> On Sep27, 2011, at 07:59 , Heikki Linnakangas wrote:
>>> On 27.09.2011 00:28, Florian Pflug wrote:
>>>> On Sep26, 2011, at 22:39 , Tom Lane wrote:
>>>>> It might be worthwhile to invoke XLogCheckInvalidPages() as soon as
>>>>> we (think we) have reached consistency, rather than leaving it to be
>>>>> done only when we exit recovery mode.
>>>>
>>>> I believe we also need to prevent the creation of restart points before
>>>> we've reached consistency.
>>>
>>> Seems reasonable. We could still allow restartpoints when the hash table is empty, though. And once we've reached
consistency,we can throw an error immediately in log_invalid_page(), instead of adding the entry in the hash table. 
>>
>> That mimics the way the rm_safe_restartpoint callbacks work, which is good.
>>
>> Actually, why don't we use that machinery to implement this? There's currently no rm_safe_restartpoint callback for
RM_XLOG_ID,so we'd just need to create one that checks whether invalid_page_tab is empty. 
>
> Okay, the attached patch prevents the creation of restartpoints by using
> rm_safe_restartpoint callback if we've not reached a consistent state yet
> and the invalid-page table is not empty. But the invalid-page table is not
> tied to the specific resource manager, so using rm_safe_restartpoint for
> that seems to slightly odd. Is this OK?
>
> Also, according to other suggestions, the patch changes XLogCheckInvalidPages()
> so that it's called as soon as we've reached a consistent state, and changes
> log_invalid_page() so that it emits PANIC immediately if consistency is already
> reached. These are very good changes, I think. Because they enable us to
> notice serious problem which causes PANIC error immediately. Without these
> changes, you unfortunately might notice that the standby database is corrupted
> when failover happens. Though such a problem might rarely happen, I think it's
> worth doing those changes.

Patch does everything we agreed it should.

Good suggestion from Florian.

This worries me slightly now though because the patch makes us PANIC
in a place we didn't used to and once we do that we cannot restart the
server at all. Are we sure we want that? It's certainly a great way to
shake down errors in other code...

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: bug of recovery?
Next
From: Bruce Momjian
Date:
Subject: Re: pg_upgrade - add config directory setting