Home > mailing lists

Re: Questions regarding distinct operation implementation - Mailing list pgsql-hackers

From	Ankit Kumar Pandey
Subject	Re: Questions regarding distinct operation implementation
Date	December 4, 2022 13:34:24
Msg-id	8273e224-c889-4b93-a335-0acf53b7bf25@gmail.com Whole thread Raw
In response to	Re: Questions regarding distinct operation implementation (David Rowley <dgrowleyml@gmail.com>)
Responses	Re: Questions regarding distinct operation implementation Re: Questions regarding distinct operation implementation
List	pgsql-hackers

Tree view

On 04/12/22 02:27, David Rowley wrote:
>
> To make this work when rows can exit the window frame seems
> significantly harder. Likely a hash table would be a better data
> structure to remove records from, but then how are you going to spill
> the hash table to disk when it reaches work_mem? As David J mentions,
> it seems like you'd need a hash table with a counter to track how many
> times a given value appears and only remove it from the table once
> that counter reaches 0.  Unsure how you're going to constrain that to
> not use more than work_mem though.
>
Interesting problem, Hashtables created in normal aggregates (AGG_HASHED 
mode) may provide some reference as they have hashagg_spill_tuple but I 
am not sure of any prior implementation of hashtable with counter and 
spill. Major concern is, if we go through tuplesort route (without order 
by case), would we get handicapped in future if we want order by or more 
features?

> Are there any other databases which support DISTINCT window aggregate
> with an ORDER BY in the window clause?
>
Oracle db support distinct window aggregates albeit without order by 
clause. Rest of databases which I've tried (mysql/sqlserver express) 
don't even support distinct in window aggregates so those gets ruled out 
as well.

> If you were to limit this to only working with the query you mentioned
> in [1], i.e PARTITION BY without an ORDER BY, then you only need to
> aggregate once per partition per aggregate and you only need to do
> that once all of the tuples for the partition are in the tuplestore.
> It seems to me like you could add all the records to a tuplesort and
> then sort by the DISTINCT column then aggregate everything except for
> consecutive duplicates. You can then aggregate any other aggregates
> which share the same DISTINCT column, otherwise, you just destroy the
> tuplesort and rinse and repeat for the next aggregate.
This looks like way to go that would ensure main use case of portability 
from Oracle.

> If you were to limit this to only working with the query you mentioned
> in [1], i.e PARTITION BY without an ORDER BY,

I need to dig deeper into order by case.

-- 
Regards,
Ankit Kumar Pandey

pgsql-hackers by date:

From: Andrew Dunstan
Date: 04 December 2022, 12:53:15
Subject: Re: Error-safe user functions

From: Tom Lane
Date: 04 December 2022, 15:25:23
Subject: Re: Error-safe user functions

Re: Questions regarding distinct operation implementation - Mailing list pgsql-hackers

Previous

Next