Wei Wang,
> How exactly slow is DISTINCT being processed in SQL engines? (not
> limited to postgresql, though comments on postgresql would be most
> relevant)
I can only give you a relative result, based exlusively on my anecdotalexperience with 7.1:
Fast: SELECT ...
Slower: SELECT ... GROUP BY x,y,z or: SELECT DISCTINCT ON (x) ... (Postgres non-standard extension)
SLowest: SELECT DISTINCT ...
The reason for this is that SELECT DISTINCT is effectively a GROUP BYon all result fields of the query, and if a few of
thearen't indexedthat requires a seq scan.
If performance is an issue, you may wish to consider restructuring yourqueries and/or data model to eliminate the
actualduplicate rows.
-Josh