Is there a reason why the implementation of hash joins uses a separate
"hash" child node? AFAICS that node is only used in hash joins. Perhaps
the intent was to be able to provide a generic "hashing" capability that
could be used by any part of the executor that needs to hash tuples, but
AFAICS the hash node is not currently used in that way.
(The reason I ask is that Andrew @ Supernews and I were discussing a
potential minor improvement to the hash join implementation. If either
of the inputs to an inner hash join is empty, we can avoid building the
hash table or reading the other join relation. The existing code works
fine if it is the inner hash relation that is empty (since that is read
first), but if the outer join relation is empty we do a lot of
unnecessary work. We could improve this by first pulling a single tuple
from the hash join's inner relation; if it is non-null, then pull a
single tuple from the outer relation. If that is also non-null, then go
and build the hash table for the inner relation as usual. This isn't
easy to implement at present because nodeHash is used to hash the inner
relation, and does the whole job at once. Of course, it would be
possible to hack nodeHash to detect the first time it is called and then
return after a single tuple, so the caller would actually invoke it
twice for non-empty input -- but that seems a bit ugly, so I'm wondering
if there is any value to maintaining the hash vs. hash join distinction
in the first place.)
-Neil