Hash vs. HashJoin nodes - Mailing list pgsql-hackers

From Neil Conway
Subject Hash vs. HashJoin nodes
Date
Msg-id 424B750D.7050009@samurai.com
Whole thread Raw
Responses Re: Hash vs. HashJoin nodes
List pgsql-hackers
Is there a reason why the implementation of hash joins uses a separate 
"hash" child node? AFAICS that node is only used in hash joins. Perhaps 
the intent was to be able to provide a generic "hashing" capability that 
could be used by any part of the executor that needs to hash tuples, but 
AFAICS the hash node is not currently used in that way.

(The reason I ask is that Andrew @ Supernews and I were discussing a 
potential minor improvement to the hash join implementation. If either 
of the inputs to an inner hash join is empty, we can avoid building the 
hash table or reading the other join relation. The existing code works 
fine if it is the inner hash relation that is empty (since that is read 
first), but if the outer join relation is empty we do a lot of 
unnecessary work. We could improve this by first pulling a single tuple 
from the hash join's inner relation; if it is non-null, then pull a 
single tuple from the outer relation. If that is also non-null, then go 
and build the hash table for the inner relation as usual. This isn't 
easy to implement at present because nodeHash is used to hash the inner 
relation, and does the whole job at once. Of course, it would be 
possible to hack nodeHash to detect the first time it is called and then 
return after a single tuple, so the caller would actually invoke it 
twice for non-empty input -- but that seems a bit ugly, so I'm wondering 
if there is any value to maintaining the hash vs. hash join distinction 
in the first place.)

-Neil



pgsql-hackers by date:

Previous
From: Christopher Kings-Lynne
Date:
Subject: TSearch2 performance issue?
Next
From: Tom Lane
Date:
Subject: Re: Hash vs. HashJoin nodes