Home > mailing lists

Re: Parallel scan with SubTransGetTopmostTransaction assert coredump - Mailing list pgsql-hackers

From	Greg Nancarrow
Subject	Re: Parallel scan with SubTransGetTopmostTransaction assert coredump
Date	August 13, 2021 09:52:31
Msg-id	CAJcOf-f61Yr-8eznjMX_h=X+WjpTh2MThCero4+oGoh1LZT5+w@mail.gmail.com Whole thread Raw
In response to	Re: Parallel scan with SubTransGetTopmostTransaction assert coredump (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: Parallel scan with SubTransGetTopmostTransaction assert coredump
List	pgsql-hackers

Tree view

On Thu, Aug 12, 2021 at 5:37 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> 1. Then why doesn't the approach I proposed fix it?
>

I think that with your approach, it is not doing the expected
initialization done by SetTransactionSnapshot() (which is called by
RestoreTransactionSnapshot(), which your approach skips in the case of
the SQL script that reproduces the problem, because
IsolationUsesXactSnapshot() returns false for XACT_READ_COMMITTED).
There's some comments in SetTransactionSnapshot() explaining the
tricky parts of this initialization, testing that the source
transaction is still running, dealing with a race condition, and
setting up TransactionXmin.
Also, there's an "if (IsolationUsesXactSnapshot()) ..." block within
that function, doing some required setup for transaction-snapshot
mode, so it doesn't seem like a good idea to not call
RestoreTransactionSnapshot() if !IsolationUsesXactSnapshot(), as the
function is obviously catering for both cases, when the isolation
level does and doesn't use a transaction snapshot. So I think
SetTransactionSnapshot() always needs to be called.

With your proposed approach, what I'm seeing is that the worker calls
GetTransactionSnapshot() at some point, which then builds a new
snapshot, and results in increasing TransactionXmin (probably because
another concurrent transaction has since completed). This snapshot is
thus later than the snapshot used in the execution state of the query
being executed. This causes the Assert in
SubTransGetTopmostTransaction() to fire because the xid doesn't follow
or equal the TransactionXmin value.

> 2. Consider the case where the toplevel query is something like SELECT
> complexfunc() FROM generate_series(1,10) g -- in a case like this, I
> think complexfunc() can cause snapshots to be taken internally. For
> example suppose we end up inside exec_eval_simple_expr, or
> SPI_cursor_open_internal, in either case with read_only = false. Here
> we're going to again call GetTransactionSnapshot() and then execute a
> query which may use parallelism.
>
>

A query always uses the ActiveSnapshot at the time the QueryDesc is
created  - so as long as you don't (as the current code does) obtain a
potentially later snapshot and try to restore that in the worker as
the TransactionSnapshot (or let the worker create a new snapshot,
because no TransactionSnapshot was restored, which may have a greater
xmin than the ActiveSnapshot) then I think it should be OK.

Regards,
Greg Nancarrow
Fujitsu Australia

pgsql-hackers by date:

From: Peter Smith
Date: 13 August 2021, 09:45:08
Subject: Re: Logical replication keepalive flood

From: "Andrey V. Lepikhov"
Date: 13 August 2021, 10:01:24
Subject: Re: Postgres picks suboptimal index after building of an extended statistics

Re: Parallel scan with SubTransGetTopmostTransaction assert coredump - Mailing list pgsql-hackers

Previous

Next