Re: [PoC] Implementation of distinct in Window Aggregates - Mailing list pgsql-hackers

From Ankit Kumar Pandey
Subject Re: [PoC] Implementation of distinct in Window Aggregates
Date
Msg-id 770824eb-a01e-354c-2434-7fed1dd832a4@gmail.com
Whole thread Raw
In response to Re: [PoC] Implementation of distinct in Window Aggregates  (Ankit Kumar Pandey <itsankitkp@gmail.com>)
Responses Re: [PoC] Implementation of distinct in Window Aggregates
List pgsql-hackers
On 04/01/23 18:10, Ankit Kumar Pandey wrote:
> On 29/12/22 20:58, Ankit Kumar Pandey wrote:
> >
> > On 24/12/22 18:22, Ankit Pandey wrote:
> >> Hi,
> >>
> >> This is a PoC patch which implements distinct operation in window 
> >> aggregates (without order by and for single column aggregation, final 
> >> version may vary wrt these limitations). Purpose of this PoC is to 
> >> get feedback on the approach used and corresponding implementation, 
> >> any nitpicking as deemed reasonable.
> >>
> >> Distinct operation is mirrored from implementation in nodeAgg. 
> >> Existing partitioning logic determines if row is in partition and 
> >> when distinct is required, all tuples for the aggregate column are 
> >> stored in tuplesort. When finalize_windowaggregate gets called, 
> >> tuples are sorted and duplicates are removed, followed by calling the 
> >> transition function on each tuple.
> >> When distinct is not required, the above process is skipped and the 
> >> transition function gets called directly and nothing gets inserted 
> >> into tuplesort.
> >> Note: For each partition, in tuplesort_begin and tuplesort_end is 
> >> involved to rinse tuplesort, so at any time, max tuples in tuplesort 
> >> is equal to tuples in a particular partition.
> >>
> >> I have verified it for interger and interval column aggregates (to 
> >> rule out obvious issues related to data types).
> >>
> >> Sample cases:
> >>
> >> create table mytable(id int, name text);
> >> insert into mytable values(1, 'A');
> >> insert into mytable values(1, 'A');
> >> insert into mytable values(5, 'B');
> >> insert into mytable values(3, 'A');
> >> insert into mytable values(1, 'A');
> >>
> >> select avg(distinct id) over (partition by name) from mytable;
> >>         avg
> >> --------------------
> >> 2.0000000000000000
> >> 2.0000000000000000
> >> 2.0000000000000000
> >> 2.0000000000000000
> >> 5.0000000000000000
> >>
> >> select avg(id) over (partition by name) from mytable;
> >>         avg
> >> --------------------
> >>  1.5000000000000000
> >>  1.5000000000000000
> >>  1.5000000000000000
> >>  1.5000000000000000
> >>  5.0000000000000000
> >>
> >> select avg(distinct id) over () from mytable;
> >>         avg
> >> --------------------
> >>  3.0000000000000000
> >>  3.0000000000000000
> >>  3.0000000000000000
> >>  3.0000000000000000
> >>  3.0000000000000000
> >>
> >> select avg(distinct id)  from mytable;
> >>         avg
> >> --------------------
> >>  3.0000000000000000
> >>
> >> This is my first-time contribution. Please let me know if anything 
> >> can be
> >> improved as I`m eager to learn.
> >>
> >> Regards,
> >> Ankit Kumar Pandey
> >
> > Hi all,
> >
> > I know everyone is busy with holidays (well, Happy Holidays!) but I 
> > will be glad if someone can take a quick look at this PoC and share 
> > thoughts.
> >
> > This is my first time contribution so I am pretty sure there will be 
> > some very obvious feedbacks (which will help me to move forward with 
> > this change).
> >
> >
> Updated patch with latest master. Last patch was an year old.
>
Attaching patch with rebase from latest HEAD


Thanks,

Ankit

Attachment

pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED
Next
From: Ankit Kumar Pandey
Date:
Subject: Re: [PoC] Implementation of distinct in Window Aggregates