Re: merging HashJoin and Hash nodes - Mailing list pgsql-hackers

From Andres Freund
Subject Re: merging HashJoin and Hash nodes
Date
Msg-id 20191031234302.2ed7ezhrai6jpx35@alap3.anarazel.de
Whole thread Raw
In response to Re: merging HashJoin and Hash nodes  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
Hi,

On 2019-10-31 23:59:19 +0100, Tomas Vondra wrote:
> On Tue, Oct 29, 2019 at 02:00:00PM +1300, Thomas Munro wrote:
> > On Tue, Oct 29, 2019 at 12:15 PM Andres Freund <andres@anarazel.de> wrote:
> > > I've groused about this a few times, but to me it seems wrong that
> > > HashJoin and Hash are separate nodes. They're so tightly bound together
> > > that keeping them separate just doesn't architecturally makes sense,
> > > imo. So I wrote a prototype.
> > > 
> > > Evidence of being tightly bound together:
> > > - functions in nodeHash.h that take a HashJoinState etc
> > > - how many functions in nodeHash.h and nodeHashjoin.h are purely exposed
> > >   so the other side can call them
> > > - there's basically no meaningful separation of concerns between code in
> > >   nodeHash.c and nodeHashjoin.c
> > > - the Hash node doesn't really exist during most of the planning, it's
> > >   kind of faked up in create_hashjoin_plan().
> > > - HashJoin knows that the inner node is always going to be a Hash node.
> > > - HashJoinState and HashState both have pointers to HashJoinTable, etc
> > > 
> > > Besides violating some aesthetical concerns, I think it also causes
> > > practical issues:
> > > 
> > > - code being in different translation code prevents the compiler from
> > >   inlining etc. There's a lot of HOT calls going between both. For each
> > >   new outer tuple we e.g. call, from nodeHashjoin.c separately into
> > >   nodeHash.c for ExecHashGetHashValue(), ExecHashGetBucketAndBatch(),
> > >   ExecHashGetSkewBucket(), ExecScanHashBucket(). They each have to
> > >   do memory loads from HashJoinState/HashJoinTable, even though previous
> > >   code *just* has done so.
> 
> I wonder how much we can gain by this. I don't expect any definitive
> figures from a patch at this stage, but maybe you have some guesses?

It's measureable, but not a world-changing difference. Some of the gains
are limited by the compiler not realizing that it does not have to
reload values across some external function calls. I saw somewhere
around ~3% for a case that was bottlenecked by HJ lookups (not
build).

I think part of the effect size is also limited by other unnecessary
inefficiencies being a larger bottleneck. E.g.

1) the fact that ExecScanHashBucket() contains branches that have
   roughly 50% likelihoods, making them unpredictable ( 1. on a
   successfull lookup we oscillate between the first hashTuple != NULL
   test succeeding and failing except in case of bucket conflict; 2. the
   while (hashTuple != NULL) oscillates similarly, because it tests for
   I. initial lookup succeeding, II. further tuple in previous bucket
   lookup III. further tuples in case of hashvalue mismatch.  Quite
   observable by profiling for br_misp_retired.conditional.
2) The fact that there's *two* indirections for a successfull lookup
   that are very likely to be cache misses. First we need to look up the
   relevant bucket, second we need to actually fetch hashvalue from the
   pointer stored in the bucket.


But even independent of these larger inefficiencies, I suspect we could
also gain more from inlining by changing nodeHashjoin a bit. E.g. moving
the HJ_SCAN_BUCKET code into an always_inline function, and also
referencing it from the tail end of the HJ_NEED_NEW_OUTER code, instead
of falling through, would allow to optimize away a number of loads (and
I think also stores), and improve branch predictor
efficiency. E.g. optimizing away store/load combinations for
node->hj_CurHashValue, node->hj_CurBucketNo, node->hj_CurSkewBucketNo;
loads of hj_HashTable, ...; stores of node->hj_JoinState,
node->hj_MatchedOuter.  And probably make the code easier to read, to
boot.


> But IMO we should make it obvious which side of the join is hashed,
> instead of relyin on users to "know which way around the relations are
> shown". The explain is often used by users who're learning stuff, or
> maybe investigating it for the first time, and we should not make it
> unnecessarily hard to understand.

I agree. I wonder if just outputting something like 'Hashed Side:
second' (or "right", or ...) could work. Not perfect, but I don't really
have a better idea.

We somewhat rely on users understanding inner/outer for explain output
(I doubt this is good, to be clear), e.g. "Inner Unique: true ". Made
worse by the fact that "inner"/"outer" is also used to describe
different kinds of joins, with a mostly independent meaning.



> OTOH we have tools visualizing execution plans, so maybe backwards
> compatibility of the output is a concern too (I know we don't promise
> anything, though).

Well, execution *would* work a bit differently, so I don't feel too bad
about tools having to adapt to that. E.g. graphical explain tool really
shouldn't display a separate Hash nodes anymore.


> > The fact that EXPLAIN doesn't label relations seems to be a separate
> > concern that applies equally to nestloop joins, and could perhaps be
> > addressed with some more optional verbosity, not a fake node?

> Yeah, that seems separate.

I'm not sure that's true. If we were labelling sub-nodes with 'inner'
and 'outer' or such, we could just make that 'hashed inner' or such. But
changing this seems to be a large compat break...

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: libpq sslpassword parameter and callback function
Next
From: Alexander Korotkov
Date:
Subject: Improve checking for pg_index.xmin