Re: Hot standby doesn't come up on some situation. - Mailing list pgsql-hackers
From | Kyotaro HORIGUCHI |
---|---|
Subject | Re: Hot standby doesn't come up on some situation. |
Date | |
Msg-id | 20140307.211647.125996224.horiguchi.kyotaro@lab.ntt.co.jp Whole thread Raw |
In response to | Re: Hot standby doesn't come up on some situation. (Heikki Linnakangas <hlinnakangas@vmware.com>) |
List | pgsql-hackers |
Hello, > > After all, I have confirmed that this fixes the problem on crash > > recovery of hot-standby botfor 9.3 and HEAD and no problem was > > found except unreadability :( > > Ok, committed. Thanks! Thank you. > Any concrete suggestions about the readability? Is there some > particular spot that needs clarifying? Well, Although I became to see no problem there after I understood how it works:-), I'll write about two points where I had difficulties to understand. | * (or the checkpoint record itself, if it's a shutdown checkpoint). | */ | if (checkPoint.redo < RecPtr) First, it was a bit tough work to confirm the equivalence between (redo == RecPtr) and that the checkpoint is shutdown checkpoint. Although finally I was convinced that it surely holds, that is actually not the point. The point here is in the first half of the phrase. The comment might be less perplexing if it were as folowing even if only shutdown checkpoint satisfies the condition. But it would occur another quiestion in readers' mind. | * (or the checkpoint record itself, e.g. if it's a shutdown checkpoint). Second, the added code depends on the assumption that RecPtr points to the checkpoint record and EndRecPtr points to the next record there. It would be better for understandability and stability (against modifying code) to explicitly declare the precondition, like this | Here RecPtr points the checkpoint record and EndRecPtr points to the | place for the record just after. > > Surely this is the consequence of illegal operation but I think > > it is also not a issue of assertion - which fires on something > > wrong in design or quite rare cases(this case ?). > > Ah, I see. Yes, that's definitely a bug. If you don't hit the > assertion, because the oldestActiveXID is set in the checkpoint record > even though wal_level is 'archive', or if you simply have assertions > disabled, the system will start up in hot standby mode even though > it's not safe. > > > So it might be > > better to show message as below on the case. > > > > | FATAL: Checkpoint doesn't have valid oldest active transaction id > > | HINT: Reading WAL might have been written under insufficient > > | wal_level. Agreed. > Hmm. When I test that with 9.2, oldestActiveXID is not 0, even though > wal_level is 'archive'. So the above patch doesn't fix the whole > problem. > > The real bug here is that CheckRequiredParameterValues() tests for > InArchiveRecovery, when it should be testing for > ArchiveRecoveryRequested. Otherwise, the checks are not performed when > going through the crash recovery followed by archive recovery. I > should've changed that as part of the commit that added the crash > recovery then archive recovery behavior. > > Fixed, thanks for pointing it out! It's my pleasre. regrds, -- Kyotaro Horiguchi NTT Open Source Software Center
pgsql-hackers by date: