Home > mailing lists

Re: Group by more efficient than distinct? - Mailing list pgsql-performance

From	Francisco Reyes
Subject	Re: Group by more efficient than distinct?
Date	April 20, 2008 12:12:16
Msg-id	cone.1208704330.298437.85914.1000@zoraida.natserv.net Whole thread Raw
In response to	Group by more efficient than distinct? (Francisco Reyes <lists@stringsutils.com>)
List	pgsql-performance

Tree view

Gregory Stark writes:

> HashAggregate needs to store more values in memory at the same time so it's
> not a good plan if you have a lot of distinct values.

So far the resulting number of rows where in the thousands and the source
data were in there hundreds of thousands and the group by was faster.

When you say "a lot of distinct values" you mean unique values as part of
the result data set?

In other words the HashAggregate will store in memory the resulting rows or
will be used for processing the source rows?

> But the planner knows that and so as long as your work_mem is set to a
> reasonable size (keeping in mind each sort or other operation feels free to

If I make sure to have vacuum analyze on a table will it be reasonable to
trust the explain to see whether distinct or group by is better?  I started
a new job and still don't have a good feeling for the sizes or distributions
of the data. Soon I will be given access to the test DB so I will be able to
do counts and explore the data without affecting production.

pgsql-performance by date:

From: James Mansion
Date: 20 April 2008, 11:37:02
Subject: Re: Background writer underemphasized ...

From: Francisco Reyes
Date: 20 April 2008, 12:15:41
Subject: Re: Group by more efficient than distinct?

Re: Group by more efficient than distinct? - Mailing list pgsql-performance

Previous

Next