Re: Hot standby doesn't come up on some situation. - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: Hot standby doesn't come up on some situation.
Date
Msg-id 20140307.211647.125996224.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: Hot standby doesn't come up on some situation.  (Heikki Linnakangas <hlinnakangas@vmware.com>)
List pgsql-hackers
Hello,

> > After all, I have confirmed that this fixes the problem on crash
> > recovery of hot-standby botfor 9.3 and HEAD and no problem was
> > found except unreadability :(
> 
> Ok, committed. Thanks!

Thank you.

> Any concrete suggestions about the readability? Is there some
> particular spot that needs clarifying?

Well, Although I became to see no problem there after I understood how
it works:-), I'll write about two points where I had difficulties to
understand.

|  * (or the checkpoint record itself, if it's a shutdown checkpoint).
|  */
| if (checkPoint.redo < RecPtr)

First, it was a bit tough work to confirm the equivalence between
(redo == RecPtr) and that the checkpoint is shutdown checkpoint.
Although finally I was convinced that it surely holds, that is
actually not the point. The point here is in the first half of the
phrase. The comment might be less perplexing if it were as folowing
even if only shutdown checkpoint satisfies the condition. But it would
occur another quiestion in readers' mind.

|  * (or the checkpoint record itself, e.g. if it's a shutdown checkpoint).

Second, the added code depends on the assumption that RecPtr points to
the checkpoint record and EndRecPtr points to the next record there.
It would be better for understandability and stability (against
modifying code) to explicitly declare the precondition, like this

| Here RecPtr points the checkpoint record and EndRecPtr points to the
| place for the record just after.


> > Surely this is the consequence of illegal operation but I think
> > it is also not a issue of assertion - which fires on something
> > wrong in design or quite rare cases(this case ?).
> 
> Ah, I see. Yes, that's definitely a bug. If you don't hit the
> assertion, because the oldestActiveXID is set in the checkpoint record
> even though wal_level is 'archive', or if you simply have assertions
> disabled, the system will start up in hot standby mode even though
> it's not safe.
> 
> > So it might be
> > better to show message as below on the case.
> >
> > | FATAL:  Checkpoint doesn't have valid oldest active transaction id
> > | HINT: Reading WAL might have been written under insufficient
> > | wal_level.

Agreed.

> Hmm. When I test that with 9.2, oldestActiveXID is not 0, even though
> wal_level is 'archive'. So the above patch doesn't fix the whole
> problem.
> 
> The real bug here is that CheckRequiredParameterValues() tests for
> InArchiveRecovery, when it should be testing for
> ArchiveRecoveryRequested. Otherwise, the checks are not performed when
> going through the crash recovery followed by archive recovery. I
> should've changed that as part of the commit that added the crash
> recovery then archive recovery behavior.
> 
> Fixed, thanks for pointing it out!

It's my pleasre.

regrds,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: extension_control_path
Next
From: Andres Freund
Date:
Subject: Re: Changeset Extraction v7.9.1