DISTINCT vs. GROUP BY - Mailing list pgsql-performance

From Dimi Paun
Subject DISTINCT vs. GROUP BY
Date
Msg-id 1265751976.2513.34.camel@localhost
Whole thread Raw
Responses Re: DISTINCT vs. GROUP BY
Re: DISTINCT vs. GROUP BY
List pgsql-performance
From what I've read on the net, these should be very similar,
and should generate equivalent plans, in such cases:

SELECT DISTINCT x FROM mytable
SELECT x FROM mytable GROUP BY x

However, in my case (postgresql-server-8.1.18-2.el5_4.1),
they generated different results with quite different
execution times (73ms vs 40ms for DISTINCT and GROUP BY
respectively):

tts_server_db=# EXPLAIN ANALYZE select userdata from tagrecord where clientRmaInId = 'CPC-RMA-00110' group by userdata;
                                                                 QUERY PLAN
                    

--------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=775.68..775.69 rows=1 width=146) (actual time=40.058..40.058 rows=0 loops=1)
   ->  Bitmap Heap Scan on tagrecord  (cost=4.00..774.96 rows=286 width=146) (actual time=40.055..40.055 rows=0
loops=1)
         Recheck Cond: ((clientrmainid)::text = 'CPC-RMA-00110'::text)
         ->  Bitmap Index Scan on idx_tagdata_clientrmainid  (cost=0.00..4.00 rows=286 width=0) (actual
time=40.050..40.050rows=0 loops=1) 
               Index Cond: ((clientrmainid)::text = 'CPC-RMA-00110'::text)
 Total runtime: 40.121 ms

tts_server_db=# EXPLAIN ANALYZE select distinct userdata from tagrecord where clientRmaInId = 'CPC-RMA-00109';
                                                                    QUERY PLAN
                          

--------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=786.63..788.06 rows=1 width=146) (actual time=73.018..73.018 rows=0 loops=1)
   ->  Sort  (cost=786.63..787.34 rows=286 width=146) (actual time=73.016..73.016 rows=0 loops=1)
         Sort Key: userdata
         ->  Bitmap Heap Scan on tagrecord  (cost=4.00..774.96 rows=286 width=146) (actual time=72.940..72.940 rows=0
loops=1)
               Recheck Cond: ((clientrmainid)::text = 'CPC-RMA-00109'::text)
               ->  Bitmap Index Scan on idx_tagdata_clientrmainid  (cost=0.00..4.00 rows=286 width=0) (actual
time=72.936..72.936rows=0 loops=1) 
                     Index Cond: ((clientrmainid)::text = 'CPC-RMA-00109'::text)
 Total runtime: 73.144 ms

What gives?

--
Dimi Paun <dimi@lattica.com>
Lattica, Inc.


pgsql-performance by date:

Previous
From: Jeff
Date:
Subject: Re: Linux I/O tuning: CFQ vs. deadline
Next
From: Thom Brown
Date:
Subject: Re: DISTINCT vs. GROUP BY