BUG #5231: SELECT DISTINCT poorly implemented vs SELECT ... GROUP BY - Mailing list pgsql-bugs

From Thomas Hamilton
Subject BUG #5231: SELECT DISTINCT poorly implemented vs SELECT ... GROUP BY
Date
Msg-id 200912031556.nB3Fu5sv015354@wwwmaster.postgresql.org
Whole thread Raw
Responses Re: BUG #5231: SELECT DISTINCT poorly implemented vs SELECT ... GROUP BY  (Joshua Tolley <eggyknap@gmail.com>)
Re: BUG #5231: SELECT DISTINCT poorly implemented vs SELECT ... GROUP BY  (Alvaro Herrera <alvherre@commandprompt.com>)
List pgsql-bugs
The following bug has been logged online:

Bug reference:      5231
Logged by:          Thomas Hamilton
Email address:      thomashamilton76@yahoo.com
PostgreSQL version: 8.3.8
Operating system:   Ubuntu 4.2.4
Description:        SELECT DISTINCT poorly implemented vs SELECT ... GROUP
BY
Details:

SELECT DISTINCT does a Sort followed by Unique.

SELECT ... GROUP BY, which is logically equivalent, performs a
HashAggregate.

When run against a large dataset with a small number of distinct results
HashAggregate is an order of magnitude more efficient!

Since the spec does not require DISTINCT to return sorted results, I don't
believe Sort ... Unique will ever be more efficient than HashAggregate.

Therefore, in order to maximize performance, DISTINCT should always be
implemented as HashAggregate.

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: Assertion failure with a subtransaction and cursor
Next
From: Joshua Tolley
Date:
Subject: Re: BUG #5231: SELECT DISTINCT poorly implemented vs SELECT ... GROUP BY