Re: DISTINCT vs. GROUP BY - Mailing list pgsql-performance

From Dimi Paun
Subject Re: DISTINCT vs. GROUP BY
Date
Msg-id 1265766227.2513.41.camel@localhost
Whole thread Raw
In response to Re: DISTINCT vs. GROUP BY  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
On Tue, 2010-02-09 at 17:38 -0500, Tom Lane wrote:
> The results certainly ought to be the same (although perhaps not with
> the same ordering) --- if they aren't, please provide a reproducible
> test case.

The results are the same, this is not a problem.

> As for efficiency, though, 8.1 didn't understand how to use hash
> aggregation for DISTINCT.  Less-obsolete versions do know how to do
> that.

Indeed, this seem to be the issue:

tts_server_db=# EXPLAIN ANALYZE select userdata from tagrecord where clientRmaInId = 'CPC-RMA-00110' group by userdata;
                                                                   QUERY PLAN
                        

------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=253.34..253.50 rows=16 width=15) (actual time=0.094..0.094 rows=0 loops=1)
   ->  Index Scan using idx_tagdata_clientrmainid on tagrecord  (cost=0.00..252.85 rows=195 width=15) (actual
time=0.091..0.091rows=0 loops=1) 
         Index Cond: ((clientrmainid)::text = 'CPC-RMA-00110'::text)
 Total runtime: 0.146 ms
(4 rows)

tts_server_db=# EXPLAIN ANALYZE select distinct userdata from tagrecord where clientRmaInId = 'CPC-RMA-00110';
                                                                      QUERY PLAN
                              

------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=260.27..261.25 rows=16 width=15) (actual time=0.115..0.115 rows=0 loops=1)
   ->  Sort  (cost=260.27..260.76 rows=195 width=15) (actual time=0.113..0.113 rows=0 loops=1)
         Sort Key: userdata
         ->  Index Scan using idx_tagdata_clientrmainid on tagrecord  (cost=0.00..252.85 rows=195 width=15) (actual
time=0.105..0.105rows=0 loops=1) 
               Index Cond: ((clientrmainid)::text = 'CPC-RMA-00110'::text)
 Total runtime: 0.151 ms
(6 rows)

For now we are stuck with 8.1, so the easiest fix for us is to use GROUP BY.
Since this is fixed in later versions, I guess there's not much to see here... :)

Thanks for the quick reply!

--
Dimi Paun <dimi@lattica.com>
Lattica, Inc.


pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: DISTINCT vs. GROUP BY
Next
From: Jayadevan M
Date:
Subject: PostgreSQL - case studies