Re: BUG #17744: Fail Assert while recoverying from pg_basebackup - Mailing list pgsql-bugs

From Kyotaro Horiguchi
Subject Re: BUG #17744: Fail Assert while recoverying from pg_basebackup
Date
Msg-id 20230202.164503.138587422646410717.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: BUG #17744: Fail Assert while recoverying from pg_basebackup  (Andres Freund <andres@anarazel.de>)
List pgsql-bugs
Thanks for the comment!

At Wed, 1 Feb 2023 07:32:52 -0800, Andres Freund <andres@anarazel.de> wrote in 
> Hi,
> 
> On 2023-01-13 18:36:05 +0900, Kyotaro Horiguchi wrote:
> > At Tue, 10 Jan 2023 07:45:45 +0000, PG Bug reporting form <noreply@postgresql.org> wrote in 
> > > #2  0x0000000000b378e9 in ExceptionalCondition (
> > >     conditionName=0xd13697 "TransactionIdIsValid(initial)", 
> > >     errorType=0xd12df4 "FailedAssertion", fileName=0xd12de8 "procarray.c",
> > > 
> > >     lineNumber=1750) at assert.c:69
> > > #3  0x0000000000962195 in ComputeXidHorizons (h=0x7ffe93de25e0)
> > >     at procarray.c:1750
> > > #4  0x00000000009628a3 in GetOldestTransactionIdConsideredRunning ()
> > >     at procarray.c:2050
> > > #5  0x00000000005972bf in CreateRestartPoint (flags=256) at xlog.c:7153
> > > #6  0x00000000008cae37 in CheckpointerMain () at checkpointer.c:464
> > 
> > The function requires a valid value in
> > ShmemVariableCache->latestCompleteXid. But it is not initialized and
> > maintained in this case.  The attached quick hack seems working, but
> > of course more decent fix is needed.
> 
> I might be missing something, but I suspect the problem here is that we
> shouldn't have been creating a restart point. Afaict, the setup
> instructions provided don't configure a recovery.signal, so we'll just
> perform crash recovery.
> 
> And I don't think it'd ever make sense to create a restart point during
> crash recovery?
> 
> Except that in this case, it's not pure crash recovery, it's restoring
> from a backup label. Due to which it actually might make sense to create
> restart points?  If you're doing PITR or such you don't really gain
> anything by doing checkpoints until you've reached consistency, unless
> you want to optimize for the case that you might need to start/stop the
> instance multiple times?
> 
> 
> So maybe it's the right thing to create restart points? Really not sure.

In my memory that behavior was intentionally introduced
recently... yes, this is ([1]). And it seems that this is the motive:

> Once we had the checkpointer running, we could also consider making
> the end-of-recovery checkpoint optional, or at least the wait for it
> to complete.  I haven't shown that in this patch, but it's just
> different checkpoint request flags and possibly an end-of-recovery
> record.  What problems do you foresee with that?  Why should we have
> "fast" promotion but not "fast" crash recovery?

[1]: https://www.postgresql.org/message-id/CA%2BhUKGJ8NRsqgkZEnsnRc2MFROBV-jCnacbYvtpptK2A9YYp9Q%40mail.gmail.com

> If we do want to do restartpoints, we definitely shouldn't try to
> TruncateSUBTRANS() in the crash-recovery-like-restartpoint case, we've
> not even done StartupSUBTRANS(), because that's guarded by
> ArchiveRecoveryRequested.

Yeah..

> The most obvious (but wrong!), fix would be to change

:)

>     if (EnableHotStandby)
>         TruncateSUBTRANS(GetOldestTransactionIdConsideredRunning());
> to
>     if (standbyState != STANDBY_DISABLED)
>         TruncateSUBTRANS(GetOldestTransactionIdConsideredRunning());
> except that doesn't work, because we don't have working access to
> standbyState. Nor the other relevant variables. Gah.

Yeah, also not have access to ArchiveRecoveryRequested (but I can
write the variable there!).

> We've really made a hash out of the state management for
> xlog.c. ArchiveRecoveryRequested, InArchiveRecovery,
> StandbyModeRequested, StandbyMode, EnableHotStandby,
> LocalHotStandbyActive, ... :(.  We use InArchiveRecovery = true, even if
> there's no archiving involved. Afaict ArchiveRecoveryRequested=false,
> InArchiveRecovery=true isn't really something the comments around the
> variables foresee.

Ah... InArchiveRecovery is set when we know how far recovery needs to
run to reach consistency regardless whether recovery starts from crash
recovery or not. This is *slightly*(:p) different from what its
comment suggests.

Mmm.  Okay, now I know it's not that simple. Let me sleep on it..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #17767: psql: tab-completion causes warnings when standard_conforming_strings = off
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: BUG #17767: psql: tab-completion causes warnings when standard_conforming_strings = off