Re: BUG #17744: Fail Assert while recoverying from pg_basebackup - Mailing list pgsql-bugs
From | Kyotaro Horiguchi |
---|---|
Subject | Re: BUG #17744: Fail Assert while recoverying from pg_basebackup |
Date | |
Msg-id | 20230202.164503.138587422646410717.horikyota.ntt@gmail.com Whole thread Raw |
In response to | Re: BUG #17744: Fail Assert while recoverying from pg_basebackup (Andres Freund <andres@anarazel.de>) |
List | pgsql-bugs |
Thanks for the comment! At Wed, 1 Feb 2023 07:32:52 -0800, Andres Freund <andres@anarazel.de> wrote in > Hi, > > On 2023-01-13 18:36:05 +0900, Kyotaro Horiguchi wrote: > > At Tue, 10 Jan 2023 07:45:45 +0000, PG Bug reporting form <noreply@postgresql.org> wrote in > > > #2 0x0000000000b378e9 in ExceptionalCondition ( > > > conditionName=0xd13697 "TransactionIdIsValid(initial)", > > > errorType=0xd12df4 "FailedAssertion", fileName=0xd12de8 "procarray.c", > > > > > > lineNumber=1750) at assert.c:69 > > > #3 0x0000000000962195 in ComputeXidHorizons (h=0x7ffe93de25e0) > > > at procarray.c:1750 > > > #4 0x00000000009628a3 in GetOldestTransactionIdConsideredRunning () > > > at procarray.c:2050 > > > #5 0x00000000005972bf in CreateRestartPoint (flags=256) at xlog.c:7153 > > > #6 0x00000000008cae37 in CheckpointerMain () at checkpointer.c:464 > > > > The function requires a valid value in > > ShmemVariableCache->latestCompleteXid. But it is not initialized and > > maintained in this case. The attached quick hack seems working, but > > of course more decent fix is needed. > > I might be missing something, but I suspect the problem here is that we > shouldn't have been creating a restart point. Afaict, the setup > instructions provided don't configure a recovery.signal, so we'll just > perform crash recovery. > > And I don't think it'd ever make sense to create a restart point during > crash recovery? > > Except that in this case, it's not pure crash recovery, it's restoring > from a backup label. Due to which it actually might make sense to create > restart points? If you're doing PITR or such you don't really gain > anything by doing checkpoints until you've reached consistency, unless > you want to optimize for the case that you might need to start/stop the > instance multiple times? > > > So maybe it's the right thing to create restart points? Really not sure. In my memory that behavior was intentionally introduced recently... yes, this is ([1]). And it seems that this is the motive: > Once we had the checkpointer running, we could also consider making > the end-of-recovery checkpoint optional, or at least the wait for it > to complete. I haven't shown that in this patch, but it's just > different checkpoint request flags and possibly an end-of-recovery > record. What problems do you foresee with that? Why should we have > "fast" promotion but not "fast" crash recovery? [1]: https://www.postgresql.org/message-id/CA%2BhUKGJ8NRsqgkZEnsnRc2MFROBV-jCnacbYvtpptK2A9YYp9Q%40mail.gmail.com > If we do want to do restartpoints, we definitely shouldn't try to > TruncateSUBTRANS() in the crash-recovery-like-restartpoint case, we've > not even done StartupSUBTRANS(), because that's guarded by > ArchiveRecoveryRequested. Yeah.. > The most obvious (but wrong!), fix would be to change :) > if (EnableHotStandby) > TruncateSUBTRANS(GetOldestTransactionIdConsideredRunning()); > to > if (standbyState != STANDBY_DISABLED) > TruncateSUBTRANS(GetOldestTransactionIdConsideredRunning()); > except that doesn't work, because we don't have working access to > standbyState. Nor the other relevant variables. Gah. Yeah, also not have access to ArchiveRecoveryRequested (but I can write the variable there!). > We've really made a hash out of the state management for > xlog.c. ArchiveRecoveryRequested, InArchiveRecovery, > StandbyModeRequested, StandbyMode, EnableHotStandby, > LocalHotStandbyActive, ... :(. We use InArchiveRecovery = true, even if > there's no archiving involved. Afaict ArchiveRecoveryRequested=false, > InArchiveRecovery=true isn't really something the comments around the > variables foresee. Ah... InArchiveRecovery is set when we know how far recovery needs to run to reach consistency regardless whether recovery starts from crash recovery or not. This is *slightly*(:p) different from what its comment suggests. Mmm. Okay, now I know it's not that simple. Let me sleep on it.. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
pgsql-bugs by date: