Richard Guo <guofenglinux@gmail.com> writes:
> On Wed, Dec 28, 2022 at 6:18 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> This patch is aimed at being smarter about cases where we have
>> redundant GROUP BY entries, for example
>> SELECT ... WHERE a.x = b.y GROUP BY a.x, b.y;
> While we are here, I wonder if we can do the same trick for
> distinctClause, to cope with cases like
> select distinct a.x, b.y from a, b where a.x = b.y;
We do that already, no?
regression=# create table foo (x int, y int);
CREATE TABLE
regression=# explain select distinct * from foo where x = 1;
QUERY PLAN
-----------------------------------------------------------------
Unique (cost=38.44..38.50 rows=11 width=8)
-> Sort (cost=38.44..38.47 rows=11 width=8)
Sort Key: y
-> Seq Scan on foo (cost=0.00..38.25 rows=11 width=8)
Filter: (x = 1)
(5 rows)
regression=# explain select distinct * from foo where x = y;
QUERY PLAN
-----------------------------------------------------------------
Unique (cost=38.44..38.50 rows=11 width=8)
-> Sort (cost=38.44..38.47 rows=11 width=8)
Sort Key: x
-> Seq Scan on foo (cost=0.00..38.25 rows=11 width=8)
Filter: (x = y)
(5 rows)
But if you do
regression=# explain select * from foo where x = y group by x, y;
QUERY PLAN
-----------------------------------------------------------------
Group (cost=38.44..38.52 rows=11 width=8)
Group Key: x, y
-> Sort (cost=38.44..38.47 rows=11 width=8)
Sort Key: x
-> Seq Scan on foo (cost=0.00..38.25 rows=11 width=8)
Filter: (x = y)
(6 rows)
then you can see that the Sort step knows it need only consider
one column even though the Group step considers both.
regards, tom lane