Re: [HACKERS] Parallel Hash take II - Mailing list pgsql-hackers

From Andres Freund
Subject Re: [HACKERS] Parallel Hash take II
Date
Msg-id 20171221084949.k36qsg4t4xpkr3rh@alap3.anarazel.de
Whole thread Raw
In response to Re: [HACKERS] Parallel Hash take II  (Thomas Munro <thomas.munro@enterprisedb.com>)
List pgsql-hackers
Hi,

> > Not really happy with the name. ExecParallelHashTableInsert() calling
> > ExecParallelHashLoadTuple() to insert a tuple into the hashtable doesn't
> > quite strike me as right; the naming similarity to
> > ExecParallelHashTableLoad seems problematic too.
> > ExecParallelHashAllocTuple() or such?
> >
> > One could argue it'd not be a bad idea to keep a similar split as
> > dense_alloc() and memcpy() have, but I'm not really convinced by
> > myself. Hm.
>
> Yeah, the names are confusing.  So:

Cool.


> > Just to make sure I understand: The "empty" phase is to protect the
> > process of the electee doing the sizing calculations etc?  And the
> > reason that's not possible to do just by waiting for
> > PHJ_GROW_BATCHES_REPARTITIONING is that somebody could dynamically
> > arrive, i.e. it'd not be needed in a statically sized barrier?
>
> Yeah, it's where you wait for the serial phase above to be finished.
> [ Explanation ] Does that make sense?

Yes.


>
> > +                       /* reset temp memory each time to avoid leaks from qual expr */
> > +                       ResetExprContext(econtext);
> > +
> > +                       if (ExecQual(hjclauses, econtext))
> >
> > I personally think it's better to avoid this pattern and store the
> > result of the ExecQual() in a variable, ResetExprContext() and then act
> > on the result.  No need to keep memory around for longer, and for bigger
> > contexts you're more likely to have all the metadata in cache.
> >
> > I'd previously thought about introducing ExecQualAndReset() or such...
>
> Makes sense, but this is code that is identical in
> ExecScanHashBucket() so I think we should keep it this way for now,
> and explore expression context lifetime improvements in a separate
> patch?  Looks like the same change could be made in other nodes too.

Ok.


> > +
> > +       /*
> > +        * If we started up so late that the shared batches have been freed
> > +        * already by ExecHashTableDetach(), then we are finished.
> > +        */
> > +       if (!DsaPointerIsValid(hashtable->parallel_state->batches))
> > +               return false;
> >
> > This is really the only place that weird condition is detected? And why
> > is that possible in the first place? And if possible, shouldn't we have
> > detected earlier?  Also, if possible, what prevents this to occur in a
> > way that test fails, because pstate->batches is freed, but not yet
> > reset?
>
> ExecParallelHashJoinNewBatch(), where this code appears, is generally
> the place that we discover that we're finished.  Normally we discover
> that we're finished by seeing that there are no batches left to chew
> on, by inspecting the per-batch state in shmem.  This weird condition
> arises when a worker starts up so late that the join is finished and
> the shmem space used to tracks batches has already been freed.  I
> agree that that was badly explained and there was in fact something a
> bit kooky about that coding.  I have now changed it so that
> ExecParallelHashEnsureBatchAccessors() detects this case and has a
> better comment to explain it, and ExecParallelHashJoinNewBatch() now
> just looks out for hashtable->batches == NULL with a comment referring
> to the other place.

Thanks for updating.


My compiler complained that ExecHashJoinImpl() might not be
inlinable. That's just because you declared it always_inline without
actually making it an inline function...


Pushed.  Yeha!  Congrats, this has been quite the project.


I suspect we'll find a bunch of problems, both on the planning and
execution side, but I think at this point we're much more likely to find
and resolve these in-tree vs. out of tree.



Btw, I see dsa_get_address() show up pretty prominently in profiles. I
kinda wonder if there's some cases where we could ameliorate the cost by
recognizing that a bunch of lookups are all going to reside in the same
segment.

- Andres


pgsql-hackers by date:

Previous
From: Mikael Kjellström
Date:
Subject: Re: != should give error?
Next
From: David Rowley
Date:
Subject: Re: [HACKERS] Runtime Partition Pruning