Re: Questions regarding distinct operation implementation - Mailing list pgsql-hackers

From Ankit Kumar Pandey
Subject Re: Questions regarding distinct operation implementation
Date
Msg-id 60471090-3d24-565f-cdbb-5150a8895780@gmail.com
Whole thread Raw
In response to Questions regarding distinct operation implementation  (Ankit Kumar Pandey <itsankitkp@gmail.com>)
Responses Re: Questions regarding distinct operation implementation
List pgsql-hackers


On 23/11/22 23:48, Ankit Kumar Pandey wrote:

Hello,


I have questions regarding distinct operation and would be glad if someone could help me out.

Consider the following table (mytable):

id, name

1, A

1, A

2, B

3, A

1, A

If we do select avg(id) over (partition by name) from mytable, partition logic goes like this:

for A: 1, 1, 3, 1

If we want to implement something like this select avg(distinct id) over (partition by name) from mytable

and remove duplicate by storing last datum of aggregate column (id) and comparing it with current value. It fails here because aggregate column is not sorted within the partition.

Questions:

1. Is sorting prerequisite for finding distinct values?

2. Is it okay to sort aggregate column (within partition) for distinct to work in case of window function?

3. Is an alternative way exists to handle this scenario (because here sort is of no use in aggregation)?


Thanks


-- 
Regards,
Ankit Kumar Pandey

Hi,

After little more digging, I can see that aggregation on Window functions are of running type, it would be bit more effective if a lookup hashtable is created where every value in current aggregate column get inserted. Whenever frame moves ahead, a lookup if performed for presence of duplicate.

On performance standpoint, this might be bad idea though.

Please let me know any opinions on this.

-- 
Regards,
Ankit Kumar Pandey

pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Add 64-bit XIDs into PostgreSQL 15
Next
From: Cary Huang
Date:
Subject: Re: Patch: Global Unique Index