On 04/01/23 18:10, Ankit Kumar Pandey wrote:
> On 29/12/22 20:58, Ankit Kumar Pandey wrote:
> >
> > On 24/12/22 18:22, Ankit Pandey wrote:
> >> Hi,
> >>
> >> This is a PoC patch which implements distinct operation in window
> >> aggregates (without order by and for single column aggregation, final
> >> version may vary wrt these limitations). Purpose of this PoC is to
> >> get feedback on the approach used and corresponding implementation,
> >> any nitpicking as deemed reasonable.
> >>
> >> Distinct operation is mirrored from implementation in nodeAgg.
> >> Existing partitioning logic determines if row is in partition and
> >> when distinct is required, all tuples for the aggregate column are
> >> stored in tuplesort. When finalize_windowaggregate gets called,
> >> tuples are sorted and duplicates are removed, followed by calling the
> >> transition function on each tuple.
> >> When distinct is not required, the above process is skipped and the
> >> transition function gets called directly and nothing gets inserted
> >> into tuplesort.
> >> Note: For each partition, in tuplesort_begin and tuplesort_end is
> >> involved to rinse tuplesort, so at any time, max tuples in tuplesort
> >> is equal to tuples in a particular partition.
> >>
> >> I have verified it for interger and interval column aggregates (to
> >> rule out obvious issues related to data types).
> >>
> >> Sample cases:
> >>
> >> create table mytable(id int, name text);
> >> insert into mytable values(1, 'A');
> >> insert into mytable values(1, 'A');
> >> insert into mytable values(5, 'B');
> >> insert into mytable values(3, 'A');
> >> insert into mytable values(1, 'A');
> >>
> >> select avg(distinct id) over (partition by name) from mytable;
> >> avg
> >> --------------------
> >> 2.0000000000000000
> >> 2.0000000000000000
> >> 2.0000000000000000
> >> 2.0000000000000000
> >> 5.0000000000000000
> >>
> >> select avg(id) over (partition by name) from mytable;
> >> avg
> >> --------------------
> >> 1.5000000000000000
> >> 1.5000000000000000
> >> 1.5000000000000000
> >> 1.5000000000000000
> >> 5.0000000000000000
> >>
> >> select avg(distinct id) over () from mytable;
> >> avg
> >> --------------------
> >> 3.0000000000000000
> >> 3.0000000000000000
> >> 3.0000000000000000
> >> 3.0000000000000000
> >> 3.0000000000000000
> >>
> >> select avg(distinct id) from mytable;
> >> avg
> >> --------------------
> >> 3.0000000000000000
> >>
> >> This is my first-time contribution. Please let me know if anything
> >> can be
> >> improved as I`m eager to learn.
> >>
> >> Regards,
> >> Ankit Kumar Pandey
> >
> > Hi all,
> >
> > I know everyone is busy with holidays (well, Happy Holidays!) but I
> > will be glad if someone can take a quick look at this PoC and share
> > thoughts.
> >
> > This is my first time contribution so I am pretty sure there will be
> > some very obvious feedbacks (which will help me to move forward with
> > this change).
> >
> >
> Updated patch with latest master. Last patch was an year old.
>
Attaching patch with rebase from latest HEAD
Thanks,
Ankit