Re: Questions regarding distinct operation implementation - Mailing list pgsql-hackers

From Ankit Kumar Pandey
Subject Re: Questions regarding distinct operation implementation
Date
Msg-id 8273e224-c889-4b93-a335-0acf53b7bf25@gmail.com
Whole thread Raw
In response to Re: Questions regarding distinct operation implementation  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: Questions regarding distinct operation implementation  (Vik Fearing <vik@postgresfriends.org>)
Re: Questions regarding distinct operation implementation  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers
On 04/12/22 02:27, David Rowley wrote:
>
> To make this work when rows can exit the window frame seems
> significantly harder. Likely a hash table would be a better data
> structure to remove records from, but then how are you going to spill
> the hash table to disk when it reaches work_mem? As David J mentions,
> it seems like you'd need a hash table with a counter to track how many
> times a given value appears and only remove it from the table once
> that counter reaches 0.  Unsure how you're going to constrain that to
> not use more than work_mem though.
>
Interesting problem, Hashtables created in normal aggregates (AGG_HASHED 
mode) may provide some reference as they have hashagg_spill_tuple but I 
am not sure of any prior implementation of hashtable with counter and 
spill. Major concern is, if we go through tuplesort route (without order 
by case), would we get handicapped in future if we want order by or more 
features?

> Are there any other databases which support DISTINCT window aggregate
> with an ORDER BY in the window clause?
>
Oracle db support distinct window aggregates albeit without order by 
clause. Rest of databases which I've tried (mysql/sqlserver express) 
don't even support distinct in window aggregates so those gets ruled out 
as well.

> If you were to limit this to only working with the query you mentioned
> in [1], i.e PARTITION BY without an ORDER BY, then you only need to
> aggregate once per partition per aggregate and you only need to do
> that once all of the tuples for the partition are in the tuplestore.
> It seems to me like you could add all the records to a tuplesort and
> then sort by the DISTINCT column then aggregate everything except for
> consecutive duplicates. You can then aggregate any other aggregates
> which share the same DISTINCT column, otherwise, you just destroy the
> tuplesort and rinse and repeat for the next aggregate.
This looks like way to go that would ensure main use case of portability 
from Oracle.

> If you were to limit this to only working with the query you mentioned
> in [1], i.e PARTITION BY without an ORDER BY,

I need to dig deeper into order by case.

-- 
Regards,
Ankit Kumar Pandey




pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Error-safe user functions
Next
From: Tom Lane
Date:
Subject: Re: Error-safe user functions