Re: Parallel scan with SubTransGetTopmostTransaction assert coredump - Mailing list pgsql-hackers

From Greg Nancarrow
Subject Re: Parallel scan with SubTransGetTopmostTransaction assert coredump
Date
Msg-id CAJcOf-ePC2PH4gu8b5UgM7=x_HdRVgBs3mfagcXOTj7nko5R2A@mail.gmail.com
Whole thread Raw
In response to Re: Parallel scan with SubTransGetTopmostTransaction assert coredump  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: Parallel scan with SubTransGetTopmostTransaction assert coredump
List pgsql-hackers
On Sat, Jul 10, 2021 at 3:36 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> But I think removing one of the snapshots (as the v2 does it) is rather
> strange too. I very much doubt having both the transaction and active
> snapshots in the parallel worker is not intentional, and Pavel may very
> well be right this breaks isolation levels that use the xact snapshot
> (i.e. REPEATABLE READ and SERIALIZABLE). I haven't checked, though.
>

Unfortunately there is currently no test, code-comment, README or
developer-discussion that definitively determines which approach (v2
vs  v3/v4) is a valid fix for this issue.
We don't know if having both the transaction and active snapshots in a
parallel worker is intentional or not, and if so, why so?
(certainly in the non-parallel case of the same statement execution,
there is only one snapshot in question here - the obtained transaction
snapshot is pushed as the active snapshot, as it is done in 95% of
cases in the code)
It seems that only the original code authors know how the snapshot
handling in parallel-workers is MEANT to work, and they have yet to
speak up about it here.
At this point, we can only all agree that there is a problem to be fixed here.

My concern with the v3/v4 patch approach is that because the
parallel-workers use a later snapshot to what is actually used in the
execution context for the statement in the parallel leader, then it is
possible for the parallel leader and parallel workers to have
different transaction visibility, and surely this cannot be correct.
For example, suppose a transaction that deletes a row, completes in
the window between these two snapshots.
Couldn't the row be visible to the parallel workers but not to the
parallel leader?
My guess is that currently there are not enough
concurrent-transactions tests to expose such a problem, and the window
here is fairly small.

So we can fiddle xmin values to avoid the immediate Assert issue here,
but it's not addressing potential xmax-related issues.


Regards,
Greg Nancarrow
Fujitsu Australia



pgsql-hackers by date:

Previous
From: Julien Rouhaud
Date:
Subject: Re: [HACKERS] Preserving param location
Next
From: Fujii Masao
Date:
Subject: Re: Remove redundant Assert(PgArchPID == 0); in PostmasterStateMachine