Re: pgsql-server/src/backend/optimizer/util pathnode.c - Mailing list pgsql-committers

From Tom Lane
Subject Re: pgsql-server/src/backend/optimizer/util pathnode.c
Date
Msg-id 8629.1078290898@sss.pgh.pa.us
Whole thread Raw
In response to Re: pgsql-server/src/backend/optimizer/util pathnode.c  (Christopher Kings-Lynne <chriskl@familyhealth.com.au>)
List pgsql-committers
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:
>> Teach is_distinct_query to recognize that GROUP BY forces a subquery's
>> output to be distinct, if all the GROUP BY columns appear in the output.
>> Per suggestion from Dennis Haney.

> Will this have should-be-in-release-notes side effects just as the
> results of a DISTINCT over a GROUP BY no longer being sorted?

No, there is no visible semantic change AFAICS.  This just eliminates
planning silliness like sorting or hashing an already-sorted-or-hashed
subplan.  Here is an example using the regression database: in 7.4

regression=# explain select * from tenk1 a where
regression-# unique1 in (select ten from tenk1 b group by ten);
                                     QUERY PLAN
-------------------------------------------------------------------------------------
 Nested Loop  (cost=483.43..543.63 rows=10 width=244)
   ->  HashAggregate  (cost=483.43..483.43 rows=10 width=4)
         ->  Subquery Scan "IN_subquery"  (cost=483.30..483.40 rows=10 width=4)
               ->  HashAggregate  (cost=483.30..483.30 rows=10 width=4)
                     ->  Seq Scan on tenk1 b  (cost=0.00..458.24 rows=10024 width=4)
   ->  Index Scan using tenk1_unique1 on tenk1 a  (cost=0.00..6.01 rows=1 width=244)
         Index Cond: (a.unique1 = "outer".ten)
(7 rows)

where CVS tip gives

                                    QUERY PLAN
-------------------------------------------------------------------------------------
 Nested Loop  (cost=483.30..543.60 rows=10 width=244)
   ->  Subquery Scan "IN_subquery"  (cost=483.30..483.40 rows=10 width=4)
         ->  HashAggregate  (cost=483.30..483.30 rows=10 width=4)
               ->  Seq Scan on tenk1 b  (cost=0.00..458.24 rows=10024 width=4)
   ->  Index Scan using tenk1_unique1 on tenk1 a  (cost=0.00..6.01 rows=1 width=244)
         Index Cond: (a.unique1 = "outer".ten)
(6 rows)

(In both cases, the HashAggregate nodes are being used to eliminate
duplicate rows.)

            regards, tom lane

pgsql-committers by date:

Previous
From: momjian@svr1.postgresql.org (Bruce Momjian)
Date:
Subject: pgsql-server/src/test/regress GNUmakefile
Next
From: momjian@svr1.postgresql.org (Bruce Momjian)
Date:
Subject: pgsql-server/doc TODO