Thread: BUG #5231: SELECT DISTINCT poorly implemented vs SELECT ... GROUP BY
The following bug has been logged online: Bug reference: 5231 Logged by: Thomas Hamilton Email address: thomashamilton76@yahoo.com PostgreSQL version: 8.3.8 Operating system: Ubuntu 4.2.4 Description: SELECT DISTINCT poorly implemented vs SELECT ... GROUP BY Details: SELECT DISTINCT does a Sort followed by Unique. SELECT ... GROUP BY, which is logically equivalent, performs a HashAggregate. When run against a large dataset with a small number of distinct results HashAggregate is an order of magnitude more efficient! Since the spec does not require DISTINCT to return sorted results, I don't believe Sort ... Unique will ever be more efficient than HashAggregate. Therefore, in order to maximize performance, DISTINCT should always be implemented as HashAggregate.
On Thu, Dec 03, 2009 at 03:56:05PM +0000, Thomas Hamilton wrote: >=20 > The following bug has been logged online: >=20 > Bug reference: 5231 > Logged by: Thomas Hamilton > Email address: thomashamilton76@yahoo.com > PostgreSQL version: 8.3.8 > Operating system: Ubuntu 4.2.4 > Description: SELECT DISTINCT poorly implemented vs SELECT ... GROUP > BY > Details:=20 >=20 > SELECT DISTINCT does a Sort followed by Unique. >=20 > SELECT ... GROUP BY, which is logically equivalent, performs a > HashAggregate. >=20 > When run against a large dataset with a small number of distinct results > HashAggregate is an order of magnitude more efficient! >=20 > Since the spec does not require DISTINCT to return sorted results, I don't > believe Sort ... Unique will ever be more efficient than HashAggregate. >=20 > Therefore, in order to maximize performance, DISTINCT should always be > implemented as HashAggregate. In 8.4 and above, SELECT DISTINCT can use HashAggregates rather than Sort/Unique -- though I'm not sure it always does. Anyway, an upgrade should handle this for you. However, this isn't really a bug, it's a feature, so don't expect to see it in an 8.3.x version. -- Josh / eggyknap
Thomas Hamilton wrote: > When run against a large dataset with a small number of distinct results > HashAggregate is an order of magnitude more efficient! > > Since the spec does not require DISTINCT to return sorted results, I don't > believe Sort ... Unique will ever be more efficient than HashAggregate. This is not a bug -- merely a missing optimizer feature, which as it turns out was implemented in 8.4. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.