Re: Remove useless GROUP BY columns considering unique index - Mailing list pgsql-hackers

From Andrei Lepikhov
Subject Re: Remove useless GROUP BY columns considering unique index
Date
Msg-id f358f934-44d6-4c17-83fe-d61c5c89e191@gmail.com
Whole thread Raw
In response to Re: Remove useless GROUP BY columns considering unique index  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers
On 12/12/24 10:09, David Rowley wrote:
> On Mon, 2 Dec 2024 at 17:18, Andrei Lepikhov <lepihov@gmail.com> wrote:
>> Patch 0002 looks helpful and performant. I propose to check 'relid > 0'
>> to avoid diving into 'foreach(lc, parse->rtable)' at all if nothing has
>> been found.
> 
> I did end up adding another fast path there, but I felt like checking
> relid > 0 wasn't as good as it could be as that would have only
> short-circuited when we don't see any Vars of level 0 in the GROUP BY.
> It seemed cheap enough to short-circuit when none of the relations
> mentioned in the GROUP BY have multiple columns mentioned.
Your solution seems much better my proposal. Thanks to apply it!

> when how do you decide if the GROUP BY should become t1.a,t1.b or
> t2.x,t2.y? It's not clear to me that using t1's columns is always
> better than using t2's. I imagine using a mix is never better, but I'm
> unsure how you'd decide which ones to use.
Depends on how to calculate that 'better'. Right now, GROUP-BY employs 
two strategies to reduce path cost: 1) ORDER-BY statement (avoid final 
sorting); 2) To fit incoming subtree pathkeys (avoid grouping presorting).
My idea comes close with [1], where the cost depends on the estimated 
number of groups in the first grouping column because cost_sort predicts 
the number of comparison operator calls based on statistics. In this 
case, the choice between (x,y) and (a,b) will depend on the ndistinct of 
'x' and 'a'.
In general, it was the idea to debate, more for further development than 
for the patch in this thread.

[1] Consider the number of columns in the sort cost model
https://www.postgresql.org/message-id/flat/8742aaa8-9519-4a1f-91bd-364aec65f5cf%40gmail.com

-- 
regards, Andrei Lepikhov



pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: Fix early elog(FATAL)
Next
From: Tom Lane
Date:
Subject: Re: Add Postgres module info