Thread: Hash vs. HashJoin nodes
Is there a reason why the implementation of hash joins uses a separate "hash" child node? AFAICS that node is only used in hash joins. Perhaps the intent was to be able to provide a generic "hashing" capability that could be used by any part of the executor that needs to hash tuples, but AFAICS the hash node is not currently used in that way. (The reason I ask is that Andrew @ Supernews and I were discussing a potential minor improvement to the hash join implementation. If either of the inputs to an inner hash join is empty, we can avoid building the hash table or reading the other join relation. The existing code works fine if it is the inner hash relation that is empty (since that is read first), but if the outer join relation is empty we do a lot of unnecessary work. We could improve this by first pulling a single tuple from the hash join's inner relation; if it is non-null, then pull a single tuple from the outer relation. If that is also non-null, then go and build the hash table for the inner relation as usual. This isn't easy to implement at present because nodeHash is used to hash the inner relation, and does the whole job at once. Of course, it would be possible to hack nodeHash to detect the first time it is called and then return after a single tuple, so the caller would actually invoke it twice for non-empty input -- but that seems a bit ugly, so I'm wondering if there is any value to maintaining the hash vs. hash join distinction in the first place.) -Neil
Neil Conway <neilc@samurai.com> writes: > ... I'm wondering if there is any value to maintaining the hash > vs. hash join distinction in the first place.) One small objection is that we'd lose the ability to separately display the time spent building the hash table in EXPLAIN ANALYZE output. It's probably not super important, but might be a reason to keep two plan nodes in the tree. I recall having looked at related ideas (not this one exactly) and being discouraged by the fact that pulling a tuple from *either* input first is demonstrably a losing strategy, since either input might have a very high startup cost. You could possibly ameliorate that by comparing the estimated startup costs for the two inputs and pulling from the estimated-cheaper one first. This could all get pretty hairy when you consider that it has to still work for left joins too ... regards, tom lane
> One small objection is that we'd lose the ability to separately display > the time spent building the hash table in EXPLAIN ANALYZE output. It's > probably not super important, but might be a reason to keep two plan > nodes in the tree. Would a separate hash node help for these kinds of queries in the future: SELECT * FROM table WHERE id IN (1,2,3,4,...massive list of scalars, ...); Chris
Tom Lane wrote: > One small objection is that we'd lose the ability to separately display > the time spent building the hash table in EXPLAIN ANALYZE output. It's > probably not super important, but might be a reason to keep two plan > nodes in the tree. Hmm, true. Perhaps then just hacking the hash node so that hash join pulls on it twice (the first time for a single tuple, the second time for the rest) is the way to go. Since the hash node is essentially an implementation detail of hash join, I don't feel _too_ bad about dirtying up its API a bit... > I recall having looked at related ideas (not this one exactly) and being > discouraged by the fact that pulling a tuple from *either* input first > is demonstrably a losing strategy, since either input might have a very > high startup cost. That is true, but I think this particular formulation avoids that problem. If we look at the inner input first and find it is non-null, we will *always* have to pull on the outer input at least once. The question is merely whether we go to the trouble of building the hash table before or after we do the initial pull on the outer relation. IOW, I think this tweak would be universally better than the existing code. > This could all get pretty hairy when you consider that it has to still > work for left joins too ... Right; I was planning to bail and only do this for inner joins. -Neil
Neil Conway <neilc@samurai.com> writes: > I think this tweak would be universally better than the existing code. Yes, but you miss the point: there's a case where the existing code isn't good and you aren't improving it. Specifically, where the inner query has high startup cost and the outer query is empty. If you'd pulled from the outer query first then you could avoid the inner startup cost. >> This could all get pretty hairy when you consider that it has to still >> work for left joins too ... > Right; I was planning to bail and only do this for inner joins. Well, for outer joins the optimal strategy is simple: pull from the outer query first. If it's empty then you needn't touch the inner query at all. Otherwise you have to build the hash table. regards, tom lane
On Thu, Mar 31, 2005 at 12:03:37AM -0500, Tom Lane wrote: > > Right; I was planning to bail and only do this for inner joins. > > Well, for outer joins the optimal strategy is simple: pull from the > outer query first. If it's empty then you needn't touch the inner > query at all. Otherwise you have to build the hash table. What about the case of an empty inner query? Granted, you still have to read in the outer query, but would there be any reason to generate hashes for it's results? -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
Neil Conway <neilc@samurai.com> writes: > Hmm, true. Perhaps then just hacking the hash node so that hash join > pulls on it twice (the first time for a single tuple, the second time > for the rest) is the way to go. Since the hash node is essentially an > implementation detail of hash join, I don't feel _too_ bad about > dirtying up its API a bit... If you still have in mind to do this, I've just committed some changes that could provide a reasonably clean framework for it. I invented a "MultiExecProcNode" interface that's intended to be used to call plan nodes that don't follow the return-one-tuple-at-a-time protocol. What I intend to use this for is indexscans returning bitmaps of tuple TIDs, but at the moment the sole member of the class is Hash. I'm thinking you could implement the above by first calling ExecProcNode (once) on the Hash node to pull the first tuple, and then calling MultiExecProcNode (once) if you wanted the hash table built. In anticipation of that, I left Hash connected to ExecProcNode, though ExecHash() currently just errors out if called. regards, tom lane