Home > mailing lists

Warts with SELECT DISTINCT - Mailing list pgsql-hackers

From	Greg Stark
Subject	Warts with SELECT DISTINCT
Date	May 3, 2006 18:58:20
Msg-id	87irom22kw.fsf@stark.xeocode.com Whole thread Raw
Responses	Re: Warts with SELECT DISTINCT
List	pgsql-hackers

Tree view


Normally Postgres extends SQL to allow ORDER BY to include arbitrary
expressions not in the select list. However this doesn't seem to work with
SELECT DISTINCT.
 stark=> \d test     Table "public.test"  Column | Type | Modifiers  --------+------+-----------  col1   | text | 
 stark=> select distinct col1 from test order by upper(col1); ERROR:  for SELECT DISTINCT, ORDER BY expressions must
appearin select list
 


It seems like as long as the expressions involve only columns or expressions
present in the SELECT DISTINCT list and as long as those functions are stable
or immutable then this shouldn't be a problem. Just prepend those expressions
to the select list to use as the sort key.

In fact the equivalent GROUP BY query does work as expected:

stark=> select col1 from test group by col1 order by upper(col1);col1 
------acx
(3 rows)


Though it's optimized poorly and does a superfluous sort step:

stark=> explain select col1 from test group by col1 order by upper(col1);                               QUERY PLAN
                          
 
---------------------------------------------------------------------------Sort  (cost=99.72..100.22 rows=200 width=32)
Sort Key: upper(col1)  ->  Group  (cost=85.43..92.08 rows=200 width=32)        ->  Sort  (cost=85.43..88.50 rows=1230
width=32)             Sort Key: col1              ->  Seq Scan on test  (cost=0.00..22.30 rows=1230 width=32)
 
(6 rows)


Whereas it shouldn't be hard to prove that this is equivalent:

stark=> explain select col1 from test group by upper(col1),col1 order by upper(col1);                            QUERY
PLAN                             
 
---------------------------------------------------------------------Group  (cost=88.50..98.23 rows=200 width=32)  ->
Sort (cost=88.50..91.58 rows=1230 width=32)        Sort Key: upper(col1), col1        ->  Seq Scan on test
(cost=0.00..25.38rows=1230 width=32)
 
(4 rows)


My understanding is that the DISTINCT and DISTINCT ON code path is old and
grotty. Perhaps it's time to remove those code paths, and replace them with a
transformation that creates the equivalent GROUP BY query and then optimize
that path until it can produce plans as good as DISTINCT and DISTINCT ON ever
did.

-- 
greg

pgsql-hackers by date:

From: "Nikolay Samokhvalov"
Date: 03 May 2006, 18:46:59
Subject: Re: [SoC] Relation between project "XML improvements" and "pgxml"

From: "Nikolay Samokhvalov"
Date: 03 May 2006, 19:02:46
Subject: Re: [SoC] Relation between project "XML improvements" and "pgxml"

Warts with SELECT DISTINCT - Mailing list pgsql-hackers

Previous

Next