Re: Assertion failure with barriers in parallel hash join - Mailing list pgsql-hackers

From Melanie Plageman
Subject Re: Assertion failure with barriers in parallel hash join
Date
Msg-id CAAKRu_am6AJEY+Y-tuWYQSK=9kMzknSCkMPiNO_tHyoNZSwGzg@mail.gmail.com
Whole thread Raw
In response to Assertion failure with barriers in parallel hash join  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Assertion failure with barriers in parallel hash join
List pgsql-hackers
On Wed, Mar 17, 2021 at 8:18 AM Thomas Munro <thomas.munro@gmail.com> wrote:
>
> On Wed, Mar 17, 2021 at 6:58 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> > According to BF animal elver there is something wrong with this
> > commit.  Looking into it.
>
> Assertion failure reproduced here and understood, but unfortunately
> it'll take some more time to fix this.  I've reverted the commit for
> now to unbreak the ~5 machines that are currently showing red in the
> build farm.

So, I think the premise of the patch
v6-0001-Fix-race-condition-in-parallel-hash-join-batch-cl.patch makes
sense: freeing the hashtable memory should happen atomically with
updating the flag that indicates to all others that the memory is not to
be accessed any longer.

As was likely reported by the buildfarm leading to you reverting the
patch, when the inner side is empty and we dump out before advancing the
build barrier past PHJ_BUILD_HASHING_OUTER, we trip the new Asserts
you've added in ExecHashTableDetach().

Assert(!pstate ||
            BarrierPhase(&pstate->build_barrier) >= PHJ_BUILD_RUNNING);

Hmm.

Maybe if the inner side is empty, we can advance the build barrier to
the end?
We help batch 0 along like this in ExecParallelHashJoinSetUpBatches().

But, I'm not sure we can expect the process executing this code to be
attached to the build barrier, can we?

@@ -296,7 +304,19 @@ ExecHashJoinImpl(PlanState *pstate, bool parallel)
                                 * outer relation.
                                 */
                                if (hashtable->totalTuples == 0 &&
!HJ_FILL_OUTER(node))
+                               {
+                                       if (parallel)
+                                       {
+                                               Barrier    *build_barrier;
+
+                                               build_barrier =
¶llel_state->build_barrier;
+                                               while
(BarrierPhase(build_barrier) < PHJ_BUILD_DONE)
+
BarrierArriveAndWait(build_barrier, 0);
+                                               BarrierDetach(build_barrier);
+                                       }
+
                                        return NULL;
+                               }



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Crash in BRIN minmax-multi indexes
Next
From: Peter Geoghegan
Date:
Subject: Re: New IndexAM API controlling index vacuum strategies