Re: slow plan for min/max - Mailing list pgsql-performance

From Tom Lane
Subject Re: slow plan for min/max
Date
Msg-id 16728.1063074086@sss.pgh.pa.us
Whole thread Raw
In response to Re: slow plan for min/max  ("Matt Clark" <matt@ymogen.net>)
Responses Re: slow plan for min/max
List pgsql-performance
"Matt Clark" <matt@ymogen.net> writes:
> Know what we (OK, I) need?  An explicitly non-aggregate max() and min(),
> implemented differently, so they can be optimised.

Not per se.  The way I've been visualizing this is that we add to
pg_aggregate a column named, say, aggsortop, with the definition:

    zero: no index optimization possible
    not zero: OID of a comparison operator ('<' or '>')

A nonzero entry means that the aggregate's value is the same as the
first item of the aggregate's input when sorted by the given operator.
(So MIN uses the '<' operator for its datatype and MAX uses '>'.)
Of course we have to add a clause to CREATE AGGREGATE to allow this to
be set for max/min aggregates of user-defined types.  But that's just a
small matter of programming.  This gives us all the type-specific info
we need; the aggsortop can be matched against the opclasses of indexes
to figure out whether a particular index is relevant to a particular max
or min call.

The hard part comes in teaching the planner to use this information
intelligently.  Exactly which queries is it even *possible* to use the
transformation for?  Which queries is it really a win for?  (Obviously
it's not if there's no matching index, but even if there is, the
presence of WHERE or GROUP BY clauses has got to affect the answer.)
How do you structure the resulting query plan, if it's at all complex
(think multiple aggregate calls...)?  I'm not clear on the answers to
any of those questions, so I'm not volunteering to try to code it up ...

            regards, tom lane

pgsql-performance by date:

Previous
From: "Matt Clark"
Date:
Subject: Re: slow plan for min/max
Next
From: Josh Berkus
Date:
Subject: Quick question