Re: Hash vs. HashJoin nodes - Mailing list pgsql-hackers

From Neil Conway
Subject Re: Hash vs. HashJoin nodes
Date
Msg-id 424B825F.7050904@samurai.com
Whole thread Raw
In response to Re: Hash vs. HashJoin nodes  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Hash vs. HashJoin nodes
Re: Hash vs. HashJoin nodes
List pgsql-hackers
Tom Lane wrote:
> One small objection is that we'd lose the ability to separately display
> the time spent building the hash table in EXPLAIN ANALYZE output.  It's
> probably not super important, but might be a reason to keep two plan
> nodes in the tree.

Hmm, true. Perhaps then just hacking the hash node so that hash join 
pulls on it twice (the first time for a single tuple, the second time 
for the rest) is the way to go. Since the hash node is essentially an 
implementation detail of hash join, I don't feel _too_ bad about 
dirtying up its API a bit...

> I recall having looked at related ideas (not this one exactly) and being
> discouraged by the fact that pulling a tuple from *either* input first
> is demonstrably a losing strategy, since either input might have a very
> high startup cost.

That is true, but I think this particular formulation avoids that 
problem. If we look at the inner input first and find it is non-null, we 
will *always* have to pull on the outer input at least once. The 
question is merely whether we go to the trouble of building the hash 
table before or after we do the initial pull on the outer relation. IOW, 
I think this tweak would be universally better than the existing code.

> This could all get pretty hairy when you consider that it has to still
> work for left joins too ...

Right; I was planning to bail and only do this for inner joins.

-Neil



pgsql-hackers by date:

Previous
From: Christopher Kings-Lynne
Date:
Subject: Re: Hash vs. HashJoin nodes
Next
From: Tom Lane
Date:
Subject: Re: Hash vs. HashJoin nodes