Re: bug of recovery? - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: bug of recovery? |
Date | |
Msg-id | CA+U5nMKDoseVh4usTz7B-N_XCFq-JHMFy2Q6=epy2udsS=e+yA@mail.gmail.com Whole thread Raw |
In response to | Re: bug of recovery? (Fujii Masao <masao.fujii@gmail.com>) |
Responses |
Re: bug of recovery?
|
List | pgsql-hackers |
On Thu, Sep 29, 2011 at 12:31 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Tue, Sep 27, 2011 at 8:06 PM, Florian Pflug <fgp@phlo.org> wrote: >> On Sep27, 2011, at 07:59 , Heikki Linnakangas wrote: >>> On 27.09.2011 00:28, Florian Pflug wrote: >>>> On Sep26, 2011, at 22:39 , Tom Lane wrote: >>>>> It might be worthwhile to invoke XLogCheckInvalidPages() as soon as >>>>> we (think we) have reached consistency, rather than leaving it to be >>>>> done only when we exit recovery mode. >>>> >>>> I believe we also need to prevent the creation of restart points before >>>> we've reached consistency. >>> >>> Seems reasonable. We could still allow restartpoints when the hash table is empty, though. And once we've reached consistency,we can throw an error immediately in log_invalid_page(), instead of adding the entry in the hash table. >> >> That mimics the way the rm_safe_restartpoint callbacks work, which is good. >> >> Actually, why don't we use that machinery to implement this? There's currently no rm_safe_restartpoint callback for RM_XLOG_ID,so we'd just need to create one that checks whether invalid_page_tab is empty. > > Okay, the attached patch prevents the creation of restartpoints by using > rm_safe_restartpoint callback if we've not reached a consistent state yet > and the invalid-page table is not empty. But the invalid-page table is not > tied to the specific resource manager, so using rm_safe_restartpoint for > that seems to slightly odd. Is this OK? > > Also, according to other suggestions, the patch changes XLogCheckInvalidPages() > so that it's called as soon as we've reached a consistent state, and changes > log_invalid_page() so that it emits PANIC immediately if consistency is already > reached. These are very good changes, I think. Because they enable us to > notice serious problem which causes PANIC error immediately. Without these > changes, you unfortunately might notice that the standby database is corrupted > when failover happens. Though such a problem might rarely happen, I think it's > worth doing those changes. Patch does everything we agreed it should. Good suggestion from Florian. This worries me slightly now though because the patch makes us PANIC in a place we didn't used to and once we do that we cannot restart the server at all. Are we sure we want that? It's certainly a great way to shake down errors in other code... -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: