Re: Parallel scan with SubTransGetTopmostTransaction assert coredump - Mailing list pgsql-hackers

From Greg Nancarrow
Subject Re: Parallel scan with SubTransGetTopmostTransaction assert coredump
Date
Msg-id CAJcOf-dbeg0CfJgo03SXD1=qzM650DrTRJDKhtp_=zvDPoxxfw@mail.gmail.com
Whole thread Raw
In response to Re: Parallel scan with SubTransGetTopmostTransaction assert coredump  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Parallel scan with SubTransGetTopmostTransaction assert coredump
List pgsql-hackers
On Tue, Aug 10, 2021 at 12:35 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> Now there is evidently something wrong with this line of reasoning
> because the code is buggy and my proposed fix doesn't work, but I
> don't know what is wrong with it. You seem to think that it's crazy
> that we try to replicate the active snapshot to the active snapshot
> and the transaction snapshot to the transaction snapshot, but that
> did, and still does, seem really sane to me. The only part that now
> seems clearly wrong to me is that !IsolationUsesXactSnapshot() case
> takes an *extra* snapshot, but since fixing that didn't fix the bug,
> there's evidently more to the problem than that.
>

I traced through snapshot processing during a parallel SELECT, up to
the point of the existing GetTransactionSnapshot() and
GetCurrentSnapshot() calls in InitializeParallelDSM().
I'm seeing the following sequence of calls, numbered below:


PortalStart():

   case PORTAL_ONE_SELECT:

(1)    PushActiveSnapshot(GetTransactionSnapshot());

...

       queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
                       portal->sourceText,
(2)                   GetActiveSnapshot(),
                       InvalidSnapshot,
                       None_Receiver,
                       params,
                       portal->queryEnv,
                       0);

...

(3)    PopActiveSnapshot();


PortalRunSelect():

(4)    PushActiveSnapshot(queryDesc->snapshot);
        ExecutorRun(queryDesc, direction, (uint64) count,
                                 portal->run_once);

        InitializeParallelDSM():

(5)        Snapshot   transaction_snapshot = GetTransactionSnapshot();
(6)        Snapshot   active_snapshot = GetActiveSnapshot();


         nprocessed = queryDesc->estate->es_processed;
(7)     PopActiveSnapshot();



The snapshot used in execution of the query is clearly the
ActiveSnapshot at the time of creating the QueryDesc [at (2)] which is
a copy of the TransactionSnapshot originally acquired [at (1)].

In InitializeParallelDSM() it acquires both the TransactionSnapshot
[at (5)] and the ActiveSnapshot [at (6)], to be serialized in the DSM
for the workers (each of which will deserialize and restore these).
But the problem I see is that the GetTransactionSnapshot() call [at
(5)] may acquire a new snapshot (i.e. a later snapshot than the
ActiveSnapshot used in the execution of the query), for example, if a
concurrent transaction has completed since GetTransactionSnapshot()
was last called [in (1)].
In this case, GetTransactionSnapshot() calls GetSnapshotDataReuse()
and it returns false, causing a new snapshot to be built by
GetTransactionSnapshot().

    curXactCompletionCount = ShmemVariableCache->xactCompletionCount;
    if (curXactCompletionCount != snapshot->snapXactCompletionCount)
        return false;

When this TransactionSnapshot is restored in a worker process, it
accordingly sets TransactionXmin, and if we look back at the coredump
stacktrace and the Assert condition that failed in the worker, we see
that the xid was expected to be >= TransactionXmin, but the Assert
fired because the xid was < TransactionXmin.

    Assert(TransactionIdFollowsOrEquals(xid, TransactionXmin));

This is explained by the TransactionSnapshot being a later snapshot in
this case.
So this is why it seems to be wrong to call GetTransactionSnapshot()
in InitializeParallelDSM() and use a separate, potentially later,
snapshot than that used in the execution state for the query.

Regards,
Greg Nancarrow
Fujitsu Australia



pgsql-hackers by date:

Previous
From: torikoshia
Date:
Subject: Re: RFC: Logging plan of the running query
Next
From: Andrew Dunstan
Date:
Subject: Re: Worth using personality(ADDR_NO_RANDOMIZE) for EXEC_BACKEND on linux?