Thread: space for optimalization: DISTINCT without index
Hello I did some test and I can see so DISTINCT works well on indexed columns, but is slow on derived tables without indexes. If I use without distinct group by I get much better times. SELECT DISTINCT a, b FROM tab SELECT a,b FROM tab GROUP BY a, b. Can You Explain it. Thank You Pavel Stehule _________________________________________________________________ Citite se osamele? Poznejte nekoho vyjmecneho diky Match.com. http://www.msn.cz/
On Mon, Dec 12, 2005 at 18:35:07 +0100, Pavel Stehule <pavel.stehule@hotmail.com> wrote: > Hello > > I did some test and I can see so DISTINCT works well on indexed columns, > but is slow on derived tables without indexes. If I use without distinct > group by I get much better times. > > SELECT DISTINCT a, b FROM tab > SELECT a,b FROM tab GROUP BY a, b. > > Can You Explain it. DISTINCT will require a sort step to remove duplicates. GROUP BY can use either a sort or hash aggregate plan. If there are few distinct values, the hash aggregate plan can be much faster.
Bruno Wolff III <bruno@wolff.to> writes: > DISTINCT will require a sort step to remove duplicates. GROUP BY can use > either a sort or hash aggregate plan. If there are few distinct values, > the hash aggregate plan can be much faster. The DISTINCT code hasn't been revisited in a long time. One obstacle to improving it is that it's very tightly intertwined with ORDER BY. While fixing that might be just a Small Matter Of Programming, it's not clear how to decouple them without breaking DISTINCT ON. regards, tom lane