Home > mailing lists

Re: SELECT DISTINCT too slow - Mailing list pgsql-sql

From	Miroslav Šulc
Subject	Re: SELECT DISTINCT too slow
Date	June 1, 2006 16:07:08
Msg-id	447F109D.5030400@startnet.cz Whole thread Raw
In response to	Re: SELECT DISTINCT too slow (Greg Stark <gsstark@mit.edu>)
List	pgsql-sql

Tree view

Greg Stark napsal(a):

Actually you could try the equivalent query:
SELECT Key FROM MRTPContactValue GROUP BY Key

This may or may not be faster because it can use a hash aggregate plan. I
would expect it to be faster here because there are few distinct keys and the
planner predicts that. 

Eventually these two queries should be handled the same by Postgres but Hash
Aggregates are a new addition and DISTINCT/DISTINCT ON hasn't been adapted to
make use of them.

Also, incidentally, I don't see how a table of possible keys could help you
here. Nothing forces they table MRTPContactValue to use all possible keys...

I simpified the case because it was slow by itself. GROUP BY really makes this a lot faster.

The table contains properties for each contact that I cannot control how many properties and what names of the properties there will be. In my scenario user can export the data through user interface and I need to know what keys are used there to create appropriate column names. There is even one constraint. The contacts are grouped into groups so I need to get only the keys from a selected group. The real query is this (which is not so fast as the plain SELECT ... GROUP BY ... because the other table is also large enough) but now it is faster than before:

SELECT Key FROM MRTPContactValue
INNER JOIN MRTPContact
ON MRTPContactValue.MRTPContactId = MRTPContact.Id
WHERE MRTPContact.MRTPWaveQuestionnaireId = 1
GROUP BY Key

Here's the query plan:

QUERY PLAN
HashAggregate (cost=32639.67..32639.76 rows=9 width=9) (actual time=19407.116..19407.146 rows=10 loops=1)
-> Hash Join (cost=8070.36..31598.51 rows=416466 width=9) (actual time=5917.367..17607.502 rows=416466 loops=1)
Hash Cond: ("outer".mrtpcontactid = "inner".id)
-> Seq Scan on mrtpcontactvalue (cost=0.00..8669.66 rows=416466 width=17) (actual time=9.094..4233.131 rows=416466 loops=1)
-> Hash (cost=7119.80..7119.80 rows=137824 width=8) (actual time=5096.750..5096.750 rows=137824 loops=1)
-> Seq Scan on mrtpcontact (cost=0.00..7119.80 rows=137824 width=8) (actual time=9.312..4337.647 rows=137824 loops=1)
Filter: (mrtpwavequestionnaireid = 1)
Total runtime: 19417.873 ms

The same query using DISTINCT takes about 40 sec to complete.

Thank you.

--
Miroslav Šulc

Attachment

miroslav.sulc.vcf

pgsql-sql by date:

From: Greg Stark
Date: 01 June 2006, 15:46:52
Subject: Re: SELECT DISTINCT too slow

From: postgres@jal.org
Date: 01 June 2006, 17:03:12
Subject: Re: Table design question

Re: SELECT DISTINCT too slow - Mailing list pgsql-sql

Attachment

Previous

Next