Re: Should HashSetOp go away - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Should HashSetOp go away
Date
Msg-id 2156464.1761501617@sss.pgh.pa.us
Whole thread Raw
In response to Should HashSetOp go away  (Jeff Janes <jeff.janes@gmail.com>)
Responses Re: Should HashSetOp go away
Re: Should HashSetOp go away
List pgsql-hackers
Jeff Janes <jeff.janes@gmail.com> writes:
> I noticed some changes in this code v18, so wanted to revisit the issue.
> Under commit 27627929528e, it looks like it got 25% more memory efficient,
> but it thinks it got 40% more efficient, so the memory use got better but
> the estimation actually got worse.

Hmm, so why not fix that estimation?

> I was thinking of ways to improve the memory usage (or at least its
> estimation) but decided maybe it would be better if HashSetOp went away
> entirely.  As far as I can tell HashSetOp has nothing to recommend it other
> than the fact that it already exists. If we instead used an elaboration on
> Hash Anti Join, then it would automatically get spilling to disk, parallel
> operations, better estimation, and the benefits of whatever micro
> optimizations people lavish on the highly used HashJoin machinery but not
> the obscure, little-used HashSetOp.

This seems like a pretty bad solution.  It would imply exporting the
complexities of duplicate-counting for EXCEPT ALL and INTERSECT ALL
modes into the hash-join logic.  We don't need that extra complexity
there (it's more than enough of a mess already), and we don't need
whatever performance hit ordinary hash joins would take.

Also, I doubt the problem is confined to nodeSetOp.  I think this is
fundamentally a complaint about BuildTupleHashTable and friends being
unable to spill to disk.  Since we also use that logic for hashed
aggregates, RecursiveUnion, and hashed SubPlans, getting rid of
nodeSetOp isn't going to move the needle very far.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Mahmoud Ayman
Date:
Subject: Cannot log in to CommitFest due to cool-off period
Next
From: Jeff Davis
Date:
Subject: Re: C11: should we use char32_t for unicode code points?