Re: Assertion failure in SnapBuildInitialSnapshot() - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Assertion failure in SnapBuildInitialSnapshot()
Date
Msg-id 20221116020048.pay3jquwrwg3qd2y@awork3.anarazel.de
Whole thread Raw
In response to Re: Assertion failure in SnapBuildInitialSnapshot()  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Assertion failure in SnapBuildInitialSnapshot()
List pgsql-hackers
Hi,

On 2022-11-15 16:20:00 +0530, Amit Kapila wrote:
> On Tue, Nov 15, 2022 at 8:08 AM Andres Freund <andres@anarazel.de> wrote:
> > nor do we enforce in an obvious place that we
> > don't already hold a snapshot.
> >
> 
> We have a check for (FirstXactSnapshot == NULL) in
> RestoreTransactionSnapshot->SetTransactionSnapshot. Won't that be
> sufficient?

I don't think that'd e.g. catch a catalog snapshot being held, yet that'd
still be bad. And I think checking in SetTransactionSnapshot() is too late,
we've already overwritten MyProc->xmin by that point.


On 2022-11-15 17:21:44 +0530, Amit Kapila wrote:
> > One thing I noticed just now is that we don't assert
> > builder->building_full_snapshot==true. I think we should? That didn't use to
> > be an option, but now it is... It doesn't look to me like that's the issue,
> > but ...
> >
> 
> Agreed.
> 
> The attached patch contains both changes. It seems to me this issue
> can happen, if somehow, either slot's effective_xmin increased after
> we assign its initial value in CreateInitDecodingContext() or somehow
> its value is InvalidTransactionId when we have invoked
> SnapBuildInitialSnapshot(). The other possibility is that the
> initial_xmin_horizon check in SnapBuildFindSnapshot() doesn't insulate
> us from assigning builder->xmin value older than initial_xmin_horizon.
> I am not able to see if any of this can be true.

Yea, I don't immediately see anything either. Given the discussion in
https://www.postgresql.org/message-id/Yz2hivgyjS1RfMKs%40depesz.com I am
starting to wonder if we've introduced a race in the slot machinery.


> diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
> index 5006a5c464..e85c75e0e6 100644
> --- a/src/backend/replication/logical/snapbuild.c
> +++ b/src/backend/replication/logical/snapbuild.c
> @@ -566,11 +566,13 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
>  {
>      Snapshot    snap;
>      TransactionId xid;
> +    TransactionId safeXid;
>      TransactionId *newxip;
>      int            newxcnt = 0;
>  
>      Assert(!FirstSnapshotSet);
>      Assert(XactIsoLevel == XACT_REPEATABLE_READ);
> +    Assert(builder->building_full_snapshot);
>  
>      if (builder->state != SNAPBUILD_CONSISTENT)
>          elog(ERROR, "cannot build an initial slot snapshot before reaching a consistent state");
> @@ -589,17 +591,13 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
>       * mechanism. Due to that we can do this without locks, we're only
>       * changing our own value.
>       */

Perhaps add something like "Creating a snapshot is expensive and an unenforced
xmin horizon would have bad consequences, therefore always double-check that
the horizon is enforced"?


> -#ifdef USE_ASSERT_CHECKING
> -    {
> -        TransactionId safeXid;
> -
> -        LWLockAcquire(ProcArrayLock, LW_SHARED);
> -        safeXid = GetOldestSafeDecodingTransactionId(false);
> -        LWLockRelease(ProcArrayLock);
> +    LWLockAcquire(ProcArrayLock, LW_SHARED);
> +    safeXid = GetOldestSafeDecodingTransactionId(false);
> +    LWLockRelease(ProcArrayLock);
>  
> -        Assert(TransactionIdPrecedesOrEquals(safeXid, snap->xmin));
> -    }
> -#endif
> +    if (TransactionIdFollows(safeXid, snap->xmin))
> +        elog(ERROR, "cannot build an initial slot snapshot when oldest safe xid %u follows snapshot's xmin %u",
> +             safeXid, snap->xmin);
>  
>      MyProc->xmin = snap->xmin;
>  

s/when/as/

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Ian Lawrence Barwick
Date:
Subject: Re: Documentation for building with meson
Next
From: Japin Li
Date:
Subject: Re: closing file in adjust_data_dir