Home > mailing lists

Re: Improve statistics estimation considering GROUP-BY as a 'uniqueiser' - Mailing list pgsql-hackers

From	Alexander Korotkov
Subject	Re: Improve statistics estimation considering GROUP-BY as a 'uniqueiser'
Date	February 19 12:48:04
Msg-id	CAPpHfdsV=_fSq5ONBJQsP9BY_kHbfu+=NAK7G0kAhDZAo3dUcw@mail.gmail.com Whole thread Raw
In response to	Re: Improve statistics estimation considering GROUP-BY as a 'uniqueiser' (Heikki Linnakangas <hlinnaka@iki.fi>)
List	pgsql-hackers

Tree view

Hi, Vlada.

On Tue, Feb 18, 2025 at 6:56 PM Vlada Pogozhelskaya
<v.pogozhelskaya@postgrespro.ru> wrote:
> Following the discussion on improving statistics estimation by considering GROUP BY as a unique constraint, I’ve
prepareda patch that integrates GROUP BY into cardinality estimation in a similar way to DISTINCT. 
>
> This patch ensures that when a subquery contains a GROUP BY clause, the optimizer recognizes the grouped columns as
unique.The logic follows a straightforward approach, comparing the GROUP BY columns with the target list to determine
uniqueness.
>
> I’d appreciate any feedback or suggestions for further improvements.

Thank you for your patch, but your message lacking of explanation on
what is your approach and how is it different from previously
published patches on this thread.  As I get from the code, you check
if group by clauses are same to targetlist.  If that's true, you
assume every column to be unique.  But that's just doesn't work this
way.  If values are unique in some set of columns, individual columns
might have repeats.  See the example.

# select x, y from generate_series(1,3) x, generate_series(1, 3) y
group by x, y;
 x | y
---+---
 3 | 2
 2 | 2
 3 | 1
 2 | 1
 1 | 3
 1 | 2
 1 | 1
 2 | 3
 3 | 3
(9 rows)

x and y are unique here as a pair.  But individual x and y values have repeats.

------
Regards,
Alexander Korotkov
Supabase

pgsql-hackers by date:

From: "Chiranmoy.Bhattacharya@fujitsu.com"
Date: 19 February, 12:31:50
Subject: Re: [PATCH] SVE popcount support

From: Benoit Lobréau
Date: 19 February, 12:51:49
Subject: Re: Fix logging for invalid recovery timeline

Re: Improve statistics estimation considering GROUP-BY as a 'uniqueiser' - Mailing list pgsql-hackers

Previous

Next