Re: AtEOXact_Snapshot timing - Mailing list pgsql-hackers

From Andres Freund
Subject Re: AtEOXact_Snapshot timing
Date
Msg-id 20190905213201.zb5fiekbnrfayb5q@alap3.anarazel.de
Whole thread Raw
In response to AtEOXact_Snapshot timing  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: AtEOXact_Snapshot timing
List pgsql-hackers
Hi,

On 2019-09-05 14:50:50 -0400, Robert Haas wrote:
> The best way that I've been able to come up with to enforce this rule
> after a little bit of thought is to add Assert(IsTransactionState())
> to a bunch of functions in snapmgr.c, most notably
> GetTransactionSnapshot and GetCatalogSnapshot.

Wonder if there's a risk that callers might still have a snapshot
around, in independent memory?  It could make sense to add such an
assert to a visibility routine or two, maybe?

I suspect that we should add non-assert condition to a central place
like GetSnapshotData(). It's not hard to imagine extensions just using
that directly, and that we'd never notice that with assert only
testing. It's also hard to imagine a single if
(unlikely(IsTransactionState())) to be expensive enough to matter
compared to GetSnapshotData()'s own cost.

I wonder, not just because of the previous paragraph, whether it could
be worthwhile to expose enough xact.c state to allow
IsTransactionState() to be done without a function call. ISMT a few
Assert(IsTransactionState()) type checks really are important enough to
be done in production builds too. Some of the relevant scenarios aren't
even close to be covered fully, and you'll get bad results if there's
such a problem.

> @@ -2732,6 +2732,18 @@ AbortTransaction(void)
>          pgstat_report_xact_timestamp(0);
>      }
>  
> +    /*
> +     * Any snapshots taken by this transaction were unsafe to use even at the
> +     * time when we entered AbortTransaction(), since we might have, for
> +     * example, inserted a heap tuple and failed while inserting index tuples.
> +     * They were even more unsafe after ProcArrayEndTransaction(), since after
> +     * that point tuples we inserted could be pruned by other backends.
> +     * However, we postpone the cleanup until this point in the sequence
> +     * because it has to be done after ResourceOwnerRelease() has finishing
> +     * unregistering snapshots.
> +     */
> +    AtEOXact_Snapshot(false, true);
> +
>      /*
>       * State remains TRANS_ABORT until CleanupTransaction().
>       */

Hm. This means that
        if (is_parallel_worker)
            CallXactCallbacks(XACT_EVENT_PARALLEL_ABORT);
        else
            CallXactCallbacks(XACT_EVENT_ABORT);
which, together with a few of the other functions, could plausibly try
to use snapshot related logic, may end up continuing to use an existing
snapshot without us detecting the problem? I think?  Especially with the
asserts present ISTM that we really should kill the existing snapshots
directly adjacent to ProcArrayEndTransaction(). As you say, after that
the snapshots aren't correct anymore.  And with the right assertions we
should be safe againsts accidental reintroduction of catalog access in
the following code?


Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: BUG #15977: Inconsistent behavior in chained transactions
Next
From: Jaime Casanova
Date:
Subject: Re: Built-in connection pooler