Re: Race condition in recovery? - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: Race condition in recovery?
Date
Msg-id CAFiTN-v+3DUbD9K5P5cK7ysjJ_EZZivLRh0tgJHyLSOecscCZA@mail.gmail.com
Whole thread Raw
In response to Re: Race condition in recovery?  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
List pgsql-hackers
On Tue, May 11, 2021 at 1:42 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> At Mon, 10 May 2021 14:27:21 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in
> > On Mon, May 10, 2021 at 2:05 PM Kyotaro Horiguchi
> > <horikyota.ntt@gmail.com> wrote:
> >
> > > I thought that the reason using receiveTLI instead of
> > > recoveryTargetTLI here is that there's a case where receiveTLI is the
> > > future of recoveryTarrgetTLI but I haven't successfully had such a
> > > situation.  If I set recovoryTargetTLI to a TLI that standby doesn't
> > > know but primary knows, validateRecoveryParameters immediately
> > > complains about that before reaching there.  Anyway the attached
> > > assumes receiveTLI may be the future of recoveryTargetTLI.
> >
> > If you see the note in this commit. It says without the timeline
> > history file, so does it trying to say that although receiveTLI is the
> > ancestor of recovoryTargetTLI,  it can not detect that because of the
> > absence of the TL.history file ?
>
> Yeah, it reads so for me and it works as described.  What I don't
> understand is that why the patch uses receiveTLI, not
> recovoryTargetTLI to load timeline hisotry in
> WaitForWALToBecomeAvailable.  The only possible reason is that there
> could be a case where receivedTLI is the future of recoveryTargetTLI.
> However, AFAICS it's impossible for that case to happen.  At
> replication start, requsting TLI is that of the last checkpoint, which
> is the same to recoveryTargetTLI, or anywhere in exising expectedTLEs
> which must be the past of recoveryTargetTLI. That seems to be already
> true at the time replication was made possible to follow a timeline
> switch (abfd192b1b).
>
> So I was tempted to just load history for recoveryTargetTLI then
> confirm that receiveTLI is in the history.  Actually that change
> doesn't harm any of the recovery TAP tests.  It is way simpler than
> the last patch. However, I'm not confident that it is right.. ;(

I first thought of fixing like as you describe that instead of loading
history of receiveTLI, load history for recoveryTargetTLI.  But then,
this commit (ee994272ca50f70b53074f0febaec97e28f83c4e) has especially
used the history file of receiveTLI to solve a particular issue which
I did not clearly understand.  I am not sure that whether it is a good
idea to directly using recoveryTargetTLI, without exactly
understanding why this commit was using receiveTLI.  It doesn't seem
like an oversight to me, it seems intentional.  Maybe Heikki can
comment on this?


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: wal stats questions
Next
From: Fujii Masao
Date:
Subject: Re: compute_query_id and pg_stat_statements