Re: slow plan for min/max - Mailing list pgsql-performance

From Matt Clark
Subject Re: slow plan for min/max
Date
Msg-id LFEIJBEOKGPDHCEMDGNFGEGJCAAA.matt@ymogen.net
Whole thread Raw
In response to Re: slow plan for min/max  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: slow plan for min/max  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
> "Matt Clark" <matt@ymogen.net> writes:
> > Actually, referring down to later parts of this thread, why can't this
> > optimisation be performed internally for built-in types?  I
> understand the
> > issue with aggregates over user-defined types, but surely
> optimising max()
> > for int4, text, etc is safe and easy?
>
> I can't see that the datatype involved has anything to do with it.
> None of the issues that come up in making the planner do this are
> datatype-specific.  You could possibly avoid adding some columns
> to pg_aggregate if you instead hard-wired the equivalent knowledge
> (for builtin types only) into some code somewhere, but a patch that
> approached it that way would be rejected as unmaintainable.

I don't pretend to have any useful knowledge of the internals of this, so
much of what I write may seem like noise to you guys.  The naive question is
'I have an index on X, so finding max(X) should be trivial, so why can't the
planner exploit that triviality?'.  AFAICS the short sophisticated answer is
that it just isn't trivial in the general case.

Upon rereading the docs on aggregates I see that it really isn't trivial at
all.  Not even knowing things like 'this index uses the same function as
this aggregate' gets you very far, because of the very general nature of the
implementation of aggs.

So it should be flagged very prominently in the docs that max() and min()
are almost always not what 90% of people want to use 90% of the time,
because indexes do the same job much better for anything other than tiny
tables.

Know what we (OK, I) need?  An explicitly non-aggregate max() and min(),
implemented differently, so they can be optimised.  let's call them
idx_max() and idx_min(), which completely bypass the standard aggregate
code.  Because let's face it, in most cases where you regularly want a max
or a min you have an index defined, and you want the DB to use it.

And I would volunteer to do it, I would, but you really don't want my C in
your project ;-)  I do volunteer to do some doc tweaking though - who do I
talk to?

M


pgsql-performance by date:

Previous
From: Odiel León
Date:
Subject:
Next
From: Tom Lane
Date:
Subject: Re: slow plan for min/max