Re: Fast promotion failure - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: Fast promotion failure
Date
Msg-id 20130516.150242.153333292.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: Fast promotion failure  (Amit Kapila <amit.kapila@huawei.com>)
Responses Re: Fast promotion failure  (Simon Riggs <simon@2ndQuadrant.com>)
Re: Fast promotion failure  (Amit Kapila <amit.kapila@huawei.com>)
List pgsql-hackers
Hello,

> > >> Is the point of this discussion that the patch may leave out some
> > >> glich about timing of timeline-related changing and Heikki saw an
> > >> egress of that?
> > >
> > > AFAIU, the committed patch has some gap in overall scenario which is
> > the
> > > fast promotion issue.
> > 
> > Right, the fast promotion issue is still there.
> > 
> > Just to get us all on the same page again: Does anyone see a problem
> > with a fresh git checkout, with the fast-promotion-quick-fix.patch
> > applied?
> > (http://www.postgresql.org/message-id/51894942.4080500@vmware.com). If
> > you do, please speak up. As far as I know, the already-committed patch,
> > together with fast-promotion-quick-fix.patch, should fix all known
> > issues (*).

Shared XLogCtl->ThisTimeLineID is written and read without
fencing by spinlock unlike some other XLogCtl members. Can this
break coherency of its memory between different processors?  It
is quite reasonable that I cannot find the trouble if it is the
cause. I didn't see the issue even without
fast-promotion-quick-fix.patch.

> The patch provided will un-necessarily call InitXLOGAccess() 2 times for End
> of recovery checkpoint, it doesn't matter w.r.t performance but actually the
> purpose will
> be almost same for calling LocalSetXLogInsertAllowed() and InitXLOGAccess(),
> or am I missing something.
> 
> One more thing, I think after fast promotion, either it should set timeline
> or give error in CreateCheckPoint() function before it reaches the check
> mentioned by you in your initial mail.
> if (RecoveryInProgress() && (flags & CHECKPOINT_END_OF_RECOVERY) == 0) 
>                 elog(ERROR, "can't create a checkpoint during recovery");
> Shouldn't it set timeline in above check (RecoveryInProgress()) or when
> RecoveryInProgress() is called before CreateCheckPoint()?

Thinking of checkpointer, it does RecoveryInProgress() far
earlier to there, in waiting loop in CheckPointerMain where to
decide which to do between checkpoint and restartpoint. So
InitXLogAccess() has been already done when checkpoint is choosed
there for the first time. And before that, ThisTimeLineID in the
startup process gets incremented and is copied onto
XLogCtl->ThisTimeLineID before xlogctl->SharedRecoveryInProgress
becomes false in StartupXLog().  I don't think it is possible for
checkpointer to run on older timeline id on codition that all
processes looks exactly the same memory image.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Daniel Farina
Date:
Subject: Re: Better LWLocks with compare-and-swap (9.4)
Next
From: Amit Langote
Date:
Subject: Re: Logging of PAM Authentication Failure