Re: DISTINCT/Optimizer question - Mailing list pgsql-hackers

From Greg Stark
Subject Re: DISTINCT/Optimizer question
Date
Msg-id 87fyhc681k.fsf@stark.xeocode.com
Whole thread Raw
In response to Re: DISTINCT/Optimizer question  (Martijn van Oosterhout <kleptog@svana.org>)
List pgsql-hackers
Martijn van Oosterhout <kleptog@svana.org> writes:

> On Fri, Jul 07, 2006 at 01:25:53PM -0400, Beth Jen wrote:
> > Right now, the distinct clause adds its targets to the sort clause list when
> > it is parsed. This causes an automatic insertion of the sort node into the
> > query plan before the application of the unique node. The hash-based
> > implementation however is meant to bypass the need to sort. I could just
> > remove this action, but the optimizer should only consider using the
> 
> <snip>
> 
> My laymans opinion suggests that this needs a new specific "distinct
> clause" which looks a lot like a sort clause only isn't. And then in
> the planner this clause would either be converted to your new node type
> or the traditional sort node.

I had always assumed that the way forward here was just to convert the
DISTINCT query into the equivalent GROUP BY query. No sense in having two
separate code paths that handle precisely the same behaviour.

> Have you considered how your code interacts with DISTINCT ON ()?
> Perhaps a clue lies there...

Therein lies the rub. There are equivalent GROUP BY forms for DISTINCT ON
queries but they aren't optimized effectively currently. Until they are
DISTINCT ON can't be translated into GROUP BY queries. I would suggest working
on optimizing those cases (min(), max(), first(), last() with GROUP BY over a
sorted subquery) and then translating DISTINCT ON queries as well.

But afaict there's nothing stopping Postgres from converting plain old
standard DISTINCT queries into GROUP BY queries currently.

-- 
greg



pgsql-hackers by date:

Previous
From: Martijn van Oosterhout
Date:
Subject: Re: DISTINCT/Optimizer question
Next
From: "hubert depesz lubaczewski"
Date:
Subject: Re: request for feature: psql 'DSN' option