Thread: space for optimalization: DISTINCT without index

space for optimalization: DISTINCT without index

From

"Pavel Stehule"

Date:

12 December 2005, 13:35:13

Hello

I did some test and I can see so DISTINCT works well on indexed columns, but 
is slow on derived tables without indexes. If I use without distinct group 
by I get much better times.

SELECT DISTINCT a, b FROM tab
SELECT a,b FROM tab GROUP BY a, b.

Can You Explain it.

Thank You
Pavel Stehule

_________________________________________________________________
Citite se osamele? Poznejte nekoho vyjmecneho diky Match.com. 
http://www.msn.cz/

Re: space for optimalization: DISTINCT without index

From

Bruno Wolff III

Date:

12 December 2005, 17:43:36

On Mon, Dec 12, 2005 at 18:35:07 +0100, Pavel Stehule <pavel.stehule@hotmail.com> wrote:
> Hello
> 
> I did some test and I can see so DISTINCT works well on indexed columns, 
> but is slow on derived tables without indexes. If I use without distinct 
> group by I get much better times.
> 
> SELECT DISTINCT a, b FROM tab
> SELECT a,b FROM tab GROUP BY a, b.
> 
> Can You Explain it.

DISTINCT will require a sort step to remove duplicates. GROUP BY can use
either a sort or hash aggregate plan. If there are few distinct values,
the hash aggregate plan can be much faster.

Re: space for optimalization: DISTINCT without index

From

Tom Lane

Date:

12 December 2005, 18:51:21

Bruno Wolff III <bruno@wolff.to> writes:
> DISTINCT will require a sort step to remove duplicates. GROUP BY can use
> either a sort or hash aggregate plan. If there are few distinct values,
> the hash aggregate plan can be much faster.

The DISTINCT code hasn't been revisited in a long time.  One obstacle to
improving it is that it's very tightly intertwined with ORDER BY.  While
fixing that might be just a Small Matter Of Programming, it's not clear
how to decouple them without breaking DISTINCT ON.
        regards, tom lane