Re: Questions regarding distinct operation implementation - Mailing list pgsql-hackers

From David G. Johnston
Subject Re: Questions regarding distinct operation implementation
Date
Msg-id CAKFQuwbd-+Mo7PwXYNJytBVRUe6dDGru41wjXf_fGgiYaxC==g@mail.gmail.com
Whole thread Raw
In response to Re: Questions regarding distinct operation implementation  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: Questions regarding distinct operation implementation  (Ankit Kumar Pandey <itsankitkp@gmail.com>)
List pgsql-hackers
On Thu, Dec 1, 2022 at 2:37 PM David Rowley <dgrowleyml@gmail.com> wrote:

The question is, what do you want to make work?  If you're not worried
about supporting DISTINCT when there is an ORDER BY clause and the
frame options are effectively ROWS BETWEEN UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING, then it's going to be much easier to make work.
You never need to worry about rows dropping out of visibility in the
frame. Simply all rows in the partition are in the frame.

I would definitely want the ability to have the output ordered and distinct at the same time.

array_agg(distinct col) over (order by whatever)

Conceptually this seems like it can be trivially accomplished with a simple lookup table, the key being the distinct column(s) and the value being a counter - with the entry being removed when the counter goes to zero (decreases happening each time a row goes out of scope).  The main concern, I suspect, isn't implementation ability, it is speed and memory consumption.

I would expect the distinct output to be identical to the non-distinct output except for duplicates removed.  Using array_agg as an example makes seeing the distinction quite easy.

Thinking over the above a bit more, is something like this possible?

array_agg(distinct col order by col) over (order by whatever)

i.e., can we add order by within the aggregate to control its internal ordering separately from the ordering needed for the window framing?

David J.

pgsql-hackers by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: Allow round() function to accept float and double precision
Next
From: Tom Lane
Date:
Subject: Re: Error-safe user functions