Re: BUG #17744: Fail Assert while recoverying from pg_basebackup - Mailing list pgsql-bugs

From Andres Freund
Subject Re: BUG #17744: Fail Assert while recoverying from pg_basebackup
Date
Msg-id 20230201153252.l6kcfum7trdovw2b@alap3.anarazel.de
Whole thread Raw
In response to Re: BUG #17744: Fail Assert while recoverying from pg_basebackup  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: BUG #17744: Fail Assert while recoverying from pg_basebackup  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Re: BUG #17744: Fail Assert while recoverying from pg_basebackup  (Michael Paquier <michael@paquier.xyz>)
List pgsql-bugs
Hi,

On 2023-01-13 18:36:05 +0900, Kyotaro Horiguchi wrote:
> At Tue, 10 Jan 2023 07:45:45 +0000, PG Bug reporting form <noreply@postgresql.org> wrote in 
> > #2  0x0000000000b378e9 in ExceptionalCondition (
> >     conditionName=0xd13697 "TransactionIdIsValid(initial)", 
> >     errorType=0xd12df4 "FailedAssertion", fileName=0xd12de8 "procarray.c",
> > 
> >     lineNumber=1750) at assert.c:69
> > #3  0x0000000000962195 in ComputeXidHorizons (h=0x7ffe93de25e0)
> >     at procarray.c:1750
> > #4  0x00000000009628a3 in GetOldestTransactionIdConsideredRunning ()
> >     at procarray.c:2050
> > #5  0x00000000005972bf in CreateRestartPoint (flags=256) at xlog.c:7153
> > #6  0x00000000008cae37 in CheckpointerMain () at checkpointer.c:464
> 
> The function requires a valid value in
> ShmemVariableCache->latestCompleteXid. But it is not initialized and
> maintained in this case.  The attached quick hack seems working, but
> of course more decent fix is needed.

I might be missing something, but I suspect the problem here is that we
shouldn't have been creating a restart point. Afaict, the setup
instructions provided don't configure a recovery.signal, so we'll just
perform crash recovery.

And I don't think it'd ever make sense to create a restart point during
crash recovery?

Except that in this case, it's not pure crash recovery, it's restoring
from a backup label. Due to which it actually might make sense to create
restart points?  If you're doing PITR or such you don't really gain
anything by doing checkpoints until you've reached consistency, unless
you want to optimize for the case that you might need to start/stop the
instance multiple times?


So maybe it's the right thing to create restart points? Really not sure.


If we do want to do restartpoints, we definitely shouldn't try to
TruncateSUBTRANS() in the crash-recovery-like-restartpoint case, we've
not even done StartupSUBTRANS(), because that's guarded by
ArchiveRecoveryRequested.

The most obvious (but wrong!), fix would be to change

    if (EnableHotStandby)
        TruncateSUBTRANS(GetOldestTransactionIdConsideredRunning());
to
    if (standbyState != STANDBY_DISABLED)
        TruncateSUBTRANS(GetOldestTransactionIdConsideredRunning());
except that doesn't work, because we don't have working access to
standbyState. Nor the other relevant variables. Gah.




We've really made a hash out of the state management for
xlog.c. ArchiveRecoveryRequested, InArchiveRecovery,
StandbyModeRequested, StandbyMode, EnableHotStandby,
LocalHotStandbyActive, ... :(.  We use InArchiveRecovery = true, even if
there's no archiving involved. Afaict ArchiveRecoveryRequested=false,
InArchiveRecovery=true isn't really something the comments around the
variables foresee.


Greetings,

Andres Freund



pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: range_agg extremely slow compared to naive implementation in obscure circumstances
Next
From: Tom Lane
Date:
Subject: Re: BUG #17767: psql: tab-completion causes warnings when standard_conforming_strings = off