Re: Assertion failure in SnapBuildInitialSnapshot() - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Assertion failure in SnapBuildInitialSnapshot() |
Date | |
Msg-id | 20221116020048.pay3jquwrwg3qd2y@awork3.anarazel.de Whole thread Raw |
In response to | Re: Assertion failure in SnapBuildInitialSnapshot() (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Assertion failure in SnapBuildInitialSnapshot()
|
List | pgsql-hackers |
Hi, On 2022-11-15 16:20:00 +0530, Amit Kapila wrote: > On Tue, Nov 15, 2022 at 8:08 AM Andres Freund <andres@anarazel.de> wrote: > > nor do we enforce in an obvious place that we > > don't already hold a snapshot. > > > > We have a check for (FirstXactSnapshot == NULL) in > RestoreTransactionSnapshot->SetTransactionSnapshot. Won't that be > sufficient? I don't think that'd e.g. catch a catalog snapshot being held, yet that'd still be bad. And I think checking in SetTransactionSnapshot() is too late, we've already overwritten MyProc->xmin by that point. On 2022-11-15 17:21:44 +0530, Amit Kapila wrote: > > One thing I noticed just now is that we don't assert > > builder->building_full_snapshot==true. I think we should? That didn't use to > > be an option, but now it is... It doesn't look to me like that's the issue, > > but ... > > > > Agreed. > > The attached patch contains both changes. It seems to me this issue > can happen, if somehow, either slot's effective_xmin increased after > we assign its initial value in CreateInitDecodingContext() or somehow > its value is InvalidTransactionId when we have invoked > SnapBuildInitialSnapshot(). The other possibility is that the > initial_xmin_horizon check in SnapBuildFindSnapshot() doesn't insulate > us from assigning builder->xmin value older than initial_xmin_horizon. > I am not able to see if any of this can be true. Yea, I don't immediately see anything either. Given the discussion in https://www.postgresql.org/message-id/Yz2hivgyjS1RfMKs%40depesz.com I am starting to wonder if we've introduced a race in the slot machinery. > diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c > index 5006a5c464..e85c75e0e6 100644 > --- a/src/backend/replication/logical/snapbuild.c > +++ b/src/backend/replication/logical/snapbuild.c > @@ -566,11 +566,13 @@ SnapBuildInitialSnapshot(SnapBuild *builder) > { > Snapshot snap; > TransactionId xid; > + TransactionId safeXid; > TransactionId *newxip; > int newxcnt = 0; > > Assert(!FirstSnapshotSet); > Assert(XactIsoLevel == XACT_REPEATABLE_READ); > + Assert(builder->building_full_snapshot); > > if (builder->state != SNAPBUILD_CONSISTENT) > elog(ERROR, "cannot build an initial slot snapshot before reaching a consistent state"); > @@ -589,17 +591,13 @@ SnapBuildInitialSnapshot(SnapBuild *builder) > * mechanism. Due to that we can do this without locks, we're only > * changing our own value. > */ Perhaps add something like "Creating a snapshot is expensive and an unenforced xmin horizon would have bad consequences, therefore always double-check that the horizon is enforced"? > -#ifdef USE_ASSERT_CHECKING > - { > - TransactionId safeXid; > - > - LWLockAcquire(ProcArrayLock, LW_SHARED); > - safeXid = GetOldestSafeDecodingTransactionId(false); > - LWLockRelease(ProcArrayLock); > + LWLockAcquire(ProcArrayLock, LW_SHARED); > + safeXid = GetOldestSafeDecodingTransactionId(false); > + LWLockRelease(ProcArrayLock); > > - Assert(TransactionIdPrecedesOrEquals(safeXid, snap->xmin)); > - } > -#endif > + if (TransactionIdFollows(safeXid, snap->xmin)) > + elog(ERROR, "cannot build an initial slot snapshot when oldest safe xid %u follows snapshot's xmin %u", > + safeXid, snap->xmin); > > MyProc->xmin = snap->xmin; > s/when/as/ Greetings, Andres Freund
pgsql-hackers by date: