Re: Fast promotion failure - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Fast promotion failure
Date
Msg-id 008301ce5216$17fda4e0$47f8eea0$@kapila@huawei.com
Whole thread Raw
In response to Re: Fast promotion failure  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
List pgsql-hackers
On Thursday, May 16, 2013 11:33 AM Kyotaro HORIGUCHI wrote:
> Hello,
> 
> > > >> Is the point of this discussion that the patch may leave out
> some
> > > >> glich about timing of timeline-related changing and Heikki saw
> an
> > > >> egress of that?
> > > >
> > > > AFAIU, the committed patch has some gap in overall scenario which
> is
> > > the
> > > > fast promotion issue.
> > >
> > > Right, the fast promotion issue is still there.
> > >
> > > Just to get us all on the same page again: Does anyone see a
> problem
> > > with a fresh git checkout, with the fast-promotion-quick-fix.patch
> > > applied?
> > > (http://www.postgresql.org/message-id/51894942.4080500@vmware.com).
> If
> > > you do, please speak up. As far as I know, the already-committed
> patch,
> > > together with fast-promotion-quick-fix.patch, should fix all known
> > > issues (*).
> 
> Shared XLogCtl->ThisTimeLineID is written and read without
> fencing by spinlock unlike some other XLogCtl members. Can this
> break coherency of its memory between different processors?  It
> is quite reasonable that I cannot find the trouble if it is the
> cause. I didn't see the issue even without
> fast-promotion-quick-fix.patch.
> 
> > The patch provided will un-necessarily call InitXLOGAccess() 2 times
> for End
> > of recovery checkpoint, it doesn't matter w.r.t performance but
> actually the
> > purpose will
> > be almost same for calling LocalSetXLogInsertAllowed() and
> InitXLOGAccess(),
> > or am I missing something.
> >
> > One more thing, I think after fast promotion, either it should set
> timeline
> > or give error in CreateCheckPoint() function before it reaches the
> check
> > mentioned by you in your initial mail.
> > if (RecoveryInProgress() && (flags & CHECKPOINT_END_OF_RECOVERY) ==
> 0)
> >                 elog(ERROR, "can't create a checkpoint during
> recovery");
> > Shouldn't it set timeline in above check (RecoveryInProgress()) or
> when
> > RecoveryInProgress() is called before CreateCheckPoint()?
> 
> Thinking of checkpointer, it does RecoveryInProgress() far
> earlier to there, in waiting loop in CheckPointerMain where to
> decide which to do between checkpoint and restartpoint. So
> InitXLogAccess() has been already done when checkpoint is choosed
> there for the first time. And before that, ThisTimeLineID in the
> startup process gets incremented and is copied onto
> XLogCtl->ThisTimeLineID before xlogctl->SharedRecoveryInProgress
> becomes false in StartupXLog().  I don't think it is possible for
> checkpointer to run on older timeline id on codition that all
> processes looks exactly the same memory image.

I also think the same, but now one difference with fast promotion is the
request for checkpoint is done after recovery; so some operations can happen
before checkpoint with new timeline.

With Regards,
Amit Kapila.




pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: Logging of PAM Authentication Failure
Next
From: Dimitri Fontaine
Date:
Subject: Re: PostgreSQL 9.3 beta breaks some extensions "make install"