Re: Status of DISTINCT-by-hashing work - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Status of DISTINCT-by-hashing work
Date
Msg-id 17158.1218050218@sss.pgh.pa.us
Whole thread Raw
In response to Re: Status of DISTINCT-by-hashing work  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
I wrote:
> ...  For INTERSECT/EXCEPT (with or without ALL),
> you really need to maintain counters in each hashtable entry so you know
> how many matching rows you got from each side of the set operation.
> So it'd be necessary to either duplicate a large chunk of nodeAgg.c, or
> make that code handle hashed INTERSECT/EXCEPT along with all its
> existing duties.  Neither of which seems particularly appealing :-(.
> I'm going to look at whether nodeAgg can be refactored to avoid this,
> but I'm feeling a bit discouraged about it at the moment.

Actually, it seems that most of what could be shared has already been
factored out into execGrouping.c.  I find that supporting hashing in
nodeSetOp.c will only roughly double its size (from 318 to 650 lines).
Although nodeAgg.c is about 1700 lines, most of its bulk comes from
managing the aggregate transition values and function calls.  There
might be some scope to save a few lines by refactoring, but it doesn't
look like it's worth the trouble.

The attached WIP patch compiles, but I've not tested it yet for lack
of planner support.  If some of the code looks suspiciously like
nodeAgg.c, it's because I started from nodeAgg and just stripped
everything that wasn't needed ...

If there are no objections, I'll push forward with persuading
the planner to support hashable set operations.

            regards, tom lane


Attachment

pgsql-hackers by date:

Previous
From: "Marko Kreen"
Date:
Subject: Re: plan invalidation vs stored procedures
Next
From: "Steve Mitchell"
Date:
Subject: ambulkinsert