Re: Questions regarding distinct operation implementation - Mailing list pgsql-hackers

From David Rowley
Subject Re: Questions regarding distinct operation implementation
Date
Msg-id CAApHDvreucj2d-H_b0-f-Gg0CK-sT99a+3txUWQd6hJYrXeQcA@mail.gmail.com
Whole thread Raw
In response to Re: Questions regarding distinct operation implementation  (Ankit Kumar Pandey <itsankitkp@gmail.com>)
Responses Re: Questions regarding distinct operation implementation
List pgsql-hackers
On Sun, 4 Dec 2022 at 08:57, Ankit Kumar Pandey <itsankitkp@gmail.com> wrote:
> On 04/12/22 00:50, David Rowley wrote:
>> providing you can code it in such a way that you only
>> allocate one of these at once, i.e not allocate one per DISTINCT
>> aggregate all at once.
>
> I am not sure if I understand this, does it means at given time, do allocation for only one distinct aggregate
> instead of all, in case of multiple aggregates using distinct?

If you were to limit this to only working with the query you mentioned
in [1], i.e PARTITION BY without an ORDER BY, then you only need to
aggregate once per partition per aggregate and you only need to do
that once all of the tuples for the partition are in the tuplestore.
It seems to me like you could add all the records to a tuplesort and
then sort by the DISTINCT column then aggregate everything except for
consecutive duplicates. You can then aggregate any other aggregates
which share the same DISTINCT column, otherwise, you just destroy the
tuplesort and rinse and repeat for the next aggregate.

To make this work when rows can exit the window frame seems
significantly harder. Likely a hash table would be a better data
structure to remove records from, but then how are you going to spill
the hash table to disk when it reaches work_mem? As David J mentions,
it seems like you'd need a hash table with a counter to track how many
times a given value appears and only remove it from the table once
that counter reaches 0.  Unsure how you're going to constrain that to
not use more than work_mem though.

Are there any other databases which support DISTINCT window aggregate
with an ORDER BY in the window clause?

David

[1] https://postgr.es/m/b10d2b78-a07e-e520-0cfc-e19f0ec685b2@gmail.com



pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: Generate pg_stat_get_* functions with Macros
Next
From: Tom Lane
Date:
Subject: Re: Error-safe user functions