Re: Race condition in recovery? - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: Race condition in recovery?
Date
Msg-id CAFiTN-tjGBRUUb=Sn91gTW8L8Co=ZyfGfhPWQVJYh_QwapSe2g@mail.gmail.com
Whole thread Raw
In response to Re: Race condition in recovery?  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: Race condition in recovery?
List pgsql-hackers
 On Fri, May 7, 2021 at 2:33 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> Thanks.
>
>
> At Fri, 7 May 2021 11:04:53 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in
> > On Fri, May 7, 2021 at 8:23 AM Kyotaro Horiguchi
> > <horikyota.ntt@gmail.com> wrote:
> > >
> > > At Tue, 4 May 2021 17:41:06 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in
> > > Could you please fix the test script so that it causes your issue
> > > correctly? And/or elaborate a bit more?
> > >
> > > The attached first file is the debugging aid logging. The second is
> > > the test script, to be placed in src/test/recovery/t.
> >
> > I will look into your test case and try to see whether we can
> > reproduce the issue.  But let me summarise what is the exact issue.
> > Basically, the issue is that first in validateRecoveryParameters if
> > the recovery target is the latest then we fetch the latest history
> > file and set the recoveryTargetTLI timeline to the latest available
> > timeline assume it's 2 but we delay updating the expectedTLEs (as per
> > commit ee994272ca50f70b53074f0febaec97e28f83c4e).  Now, while reading
>
> I think it is right up to here.
>
> > the checkpoint record if we don't get the required WAL from the
> > archive then we try to get from primary, and while getting checkpoint
> > from primary we use "ControlFile->checkPointCopy.ThisTimeLineID"
> > suppose that is older timeline 1.  Now after reading the checkpoint we
> > will set the expectedTLEs based on the timeline from which we got the
> > checkpoint record.
>
> I doubt this point.  ReadCheckpointRecord finally calls
> XLogFileReadAnyTLI and it uses the content of the 00000002.history as
> the local timeline entry list, since expectedTLEs is NIL and
> recoveryTargetTLI has been updated to 2 by
> validateRecoveryParameters(). But node 3 was having only the segment
> on TLI=1 so ReadCheckPointRecord() finds the wanted checkpoint recrod
> on TLI=1.  XLogFileReadAnyTLI() copies the local TLE list based on
> TLI=2 to expectedTLEs just after that because the wanted checkpoint
> record was available based on the list.

Okay, I got your point, now, consider the scenario that we are trying
to get the checkpoint record in XLogFileReadAnyTLI, you are right that
it returns history file 00000002.history.  I think I did not mention
one point, basically, the tool while restarting node 3 after promoting
node 2 is deleting all the local WAL of node3 (so that node 3 can
follow node2).  So now node3 doesn't have the checkpoint in the local
segment.  Suppose checkpoint record was in segment
000000010000000000000001, but after TL switch 000000010000000000000001
is renamed to 000000010000000000000001.partial on node2 so now
practically doesn't have 000000010000000000000001 file anywhere.
However if TL switch mid-segment then we copy that segment with new TL
so we have 000000020000000000000001 which contains the checkpoint
record, but node 2 haven't yet archived it.

So now you come out of XLogFileReadAnyTLI, without reading checkpoint
record and without setting expectedTLEs.  Because expectedTLEs is only
set if we are able to read the checkpoint record.  Make sense?

> So I don't think checkPointCopy.ThisTimeLineID cannot affect this
> logic, and don't think expectedTLEs is left with NIL.  It's helpful
> that you could show the specific code path to cause that.

So now expectedTLEs is still NULL and you go to get the checkpoint
record from primary and use checkPointCopy.ThisTimeLineID.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: few ideas for pgbench
Next
From: Michail Nikolaev
Date:
Subject: Re: [PATCH] Full support for index LP_DEAD hint bits on standby