Re: [EXTERNAL]Re: BUG #19094: select statement on postgres 17 vs postgres 18 is returning different/duplicate results - Mailing list pgsql-bugs

From Richard Guo
Subject Re: [EXTERNAL]Re: BUG #19094: select statement on postgres 17 vs postgres 18 is returning different/duplicate results
Date
Msg-id CAMbWs49MBMkc_HhCW2v_cNSPH8=tCKDKd0AgEX6OEp-xOkbFfw@mail.gmail.com
Whole thread Raw
In response to Re: [EXTERNAL]Re: BUG #19094: select statement on postgres 17 vs postgres 18 is returning different/duplicate results  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [EXTERNAL]Re: BUG #19094: select statement on postgres 17 vs postgres 18 is returning different/duplicate results
Re: [EXTERNAL]Re: BUG #19094: select statement on postgres 17 vs postgres 18 is returning different/duplicate results
List pgsql-bugs
On Tue, Oct 28, 2025 at 5:12 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> As far as I can see, in a parallel hash join node->hj_CurTuple will
> be pointing into a shared hash table, so that this code is fooling
> with a shared HasMatch flag bit with no sort of lock whatsoever.
> This query doesn't have any joinqual at the PHJ level if I'm reading
> EXPLAIN correctly, so the window to get it wrong isn't very wide.
> Nonetheless, if two parallel workers inspect the same inner tuple
> at about the same time, this code would allow both of them to
> return it.
>
> I'm unsure if we've got any infrastructure that'd allow setting the
> tuple's match bit in a more atomic fashion.  We might have to revert
> aa86129e1.

Thanks Lori for the report and test case.

Thanks Tom for the analysis.  I think you're right.  In a parallel
hash join, the inner relation is stored in a shared global hash table
for probing, allowing multiple workers to access the same tuples
concurrently.  If two workers probe the same inner tuple at the same
time, it can lead to duplicate emissions of that tuple, violating the
semantics of a right semi join.

AFAICT, there are 3 possible options for a fix.

1) Revert aa86129e1.

2) Modify the code to perform atomic operations on the matched flag
using a CAS (or a similar) mechanism when running in parallel
execution.

3) Disable parallel right semi joins in the planner.

For option #2, it seems to me that it would require non-trivial
changes, which might not be suitable for back-patching.  We would also
need to evaluate the performance impact of introducing atomic
operations.  I'm somewhat inclined toward option #3 instead.  We can
implement the disablement in hash_inner_and_outer(), which would only
require a small modification (perhaps two lines, plus some comment
updates).  If option #2 doesn't work out, I'm open to option #1.

(I'm still trying to understand why concurrent access to the matched
flag in cases other than right semi joins (such as right or full
joins) doesn't lead to concurrency issues.)

- Richard



pgsql-bugs by date:

Previous
From: Gavin Wahl
Date:
Subject: Re: psql --echo-queries does not echo all queries
Next
From: Richard Guo
Date:
Subject: Re: [EXTERNAL]Re: BUG #19094: select statement on postgres 17 vs postgres 18 is returning different/duplicate results