Re: merging HashJoin and Hash nodes - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: merging HashJoin and Hash nodes
Date
Msg-id CA+hUKGJeA3LQh++ZQN_wXjgapu57PVOpH6kNNYZTgzhDZKfRSA@mail.gmail.com
Whole thread Raw
In response to merging HashJoin and Hash nodes  (Andres Freund <andres@anarazel.de>)
Responses Re: merging HashJoin and Hash nodes
List pgsql-hackers
On Tue, Oct 29, 2019 at 12:15 PM Andres Freund <andres@anarazel.de> wrote:
> I've groused about this a few times, but to me it seems wrong that
> HashJoin and Hash are separate nodes. They're so tightly bound together
> that keeping them separate just doesn't architecturally makes sense,
> imo. So I wrote a prototype.
>
> Evidence of being tightly bound together:
> - functions in nodeHash.h that take a HashJoinState etc
> - how many functions in nodeHash.h and nodeHashjoin.h are purely exposed
>   so the other side can call them
> - there's basically no meaningful separation of concerns between code in
>   nodeHash.c and nodeHashjoin.c
> - the Hash node doesn't really exist during most of the planning, it's
>   kind of faked up in create_hashjoin_plan().
> - HashJoin knows that the inner node is always going to be a Hash node.
> - HashJoinState and HashState both have pointers to HashJoinTable, etc
>
> Besides violating some aesthetical concerns, I think it also causes
> practical issues:
>
> - code being in different translation code prevents the compiler from
>   inlining etc. There's a lot of HOT calls going between both. For each
>   new outer tuple we e.g. call, from nodeHashjoin.c separately into
>   nodeHash.c for ExecHashGetHashValue(), ExecHashGetBucketAndBatch(),
>   ExecHashGetSkewBucket(), ExecScanHashBucket(). They each have to
>   do memory loads from HashJoinState/HashJoinTable, even though previous
>   code *just* has done so.
> - a separate executor node, and all the ancillary data (slots,
>   expression context, target lists etc) is far from free
> - instead of just applying an "empty outer" style optimization to both
>   sides of the HashJoin, we have to choose. Once unified it's fairly
>   easy to just use it on both.
> - generally, a lot of improvements are harder to develop because of the
>   odd separation.

I agree with all of that.

> Does anybody have good arguments for keeping them separate? The only
> real one I can see is that it's not a small change, and will make
> bugfixes etc a bit harder.  Personally I think that's outweighed by the
> disadvantages.

Yeah, the ~260KB of churn you came up with is probably the reason I
didn't even think of suggesting something along these lines while
working on PHJ, though it did occur to me that the division was
entirely artificial as I carefully smashed more holes in both
directions through that wall.

Trying to think of a reason to keep Hash, I remembered Kohei KaiGai's
speculation about Hash nodes that are shared by different Hash Join
nodes (in the context of a partition-wise join where each partition is
joined against one table).  But even if we were to try to do that, a
Hash node isn't necessary to share the hash table, so that's not an
argument.

> Attached is a quick prototype that unifies them. It's not actually that hard,
> I think? Obviously this is far from ready, but I thought it'd be a good
> basis to get a discussion started?

I haven't looked at the patch yet but this yet but it sounds like a
good start from the description.

> Comments on the prototype:
>
> - I've hacked EXPLAIN to still show the Hash node, to reduce the size of
>   the diffs. I'm doubtful that that's the right approach (and I'm sure
>   it's not the right approach to do so with the code I injected) - I
>   think the Hash node in the explain doesn't really help users, and just
>   makes the explain bigger (except for making it clearer which side is
>   hashed)

Yeah, I'm not sure why you'd want to show a Hash node to users if
there is no way to use it in any other context than a Hash Join.

FWIW, Oracle, DB2 and SQL Server don't show an intermediate Hash node
in their plans, and generally you just have to know which way around
the input relations are shown (from a quick a glance at some examples
found on the web, Oracle and SQL Server show hash relation above probe
relation, while DB2, PostgreSQL and MySQL show probe relation above
hash relation).  Curiously, MySQL 8 just added hash joins, and they do
show a Hash node (at least in FORMAT=TREE mode, which looks a bit like
our EXPLAIN).

The fact that EXPLAIN doesn't label relations seems to be a separate
concern that applies equally to nestloop joins, and could perhaps be
addressed with some more optional verbosity, not a fake node?



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: [PATCH] Do not use StdRdOptions in Access Methods
Next
From: Michael Paquier
Date:
Subject: Re: v12.0: interrupt reindex CONCURRENTLY: ccold: ERROR: could notfind tuple for parent of relation ...