Re: Overriding the optimizer - Mailing list pgsql-performance

From Jaime Casanova
Subject Re: Overriding the optimizer
Date
Msg-id c2d9e70e0512152025r89ccfeeo10a85f2e26368a2a@mail.gmail.com
Whole thread Raw
In response to Re: Overriding the optimizer  (David Lang <dlang@invendra.net>)
List pgsql-performance
On 12/15/05, David Lang <dlang@invendra.net> wrote:
> On Thu, 15 Dec 2005, Craig A. James wrote:
>
> > Mark Kirkwood wrote:
> >> I hear what you are saying, but to use this fine example - I don't know
> >> what the best plan is - these experiments part of an investigation to
> find
> >> *if* there is a better plan, and if so, why Postgres is not finding it.
> >>
> >>> There isn't a database in the world that is as smart as a developer, or
> >>> that can have insight into things that only a developer can possibly
> know.
> >>
> >> That is often true - but the aim is to get Postgres's optimizer closer to
> >> developer smartness.
> >
> > What would be cool would be some way the developer could alter the plan,
> but
> > they way of doing so would strongly encourage the developer to send the
> > information to this mailing list.  Postgres would essentially say, "Ok,
> you
> > can do that, but we want to know why!"
>
> at the risk of sounding flippent (which is NOT what I intend) I will point
> out that with the source you can change the optimizer any way you need to
> :-)
>
> that being said, in your example the issue is the cost of the user created
> function and the fact that postgres doesn't know it's cost.
>
> would a resonable answer be to give postgres a way to learn how expensive
> the call is?
>
> a couple ways I could see to do this.
>
> 1. store some stats automagicly when the function is called and update the
> optimization plan when you do an ANALYSE
>
> 2. provide a way for a user to explicitly set a cost factor for a function
> (with a default value that's sane for fairly trivial functions so that it
> would only have to be set for unuseually expensive functions)
>
> now, neither of these will work all the time if a given function is
> sometimes cheap and sometimes expensive (depending on it's parameters),
> but in that case I would say that if the application knows that a function
> will be unusueally expensive under some conditions (and knows what those
> conditions will be) it may be a reasonable answer to duplicate the
> function, one copy that it uses most of the time, and a second copy that
> it uses when it expects it to be expensive. at this point the cost of the
> function can be set via either of the methods listed above)
>
> >> After years of using several other database products (some supporting
> hint
> >> type constructs and some not), I have come to believe that hinting (or
> >> similar) actually *hinders* the development of a great optimizer.
> >
> > I agree.  It takes the pressure off the optimizer gurus.  If the users can
> > just work around every problem, then the optimizer can suck and the system
> is
> > still usable.
> >
> > Lest anyone think I'm an all-out advocate of overriding the optimizer, I
> know
> > from first-hand experience what a catastrophe it can be.  An Oracle hint I
> > used worked fine on my test schema, but the customer's "table" turned out
> to
> > be a view, and Oracle's optimizer worked well on the view whereas my hint
> was
> > horrible.  Unfortunately, without the hint, Oracle sucked when working on
> an
> > ordinary table.  Hints are dangerous, and I consider them a last resort.
>
> I've been on the linux-kernel mailing list for the last 9 years, and have
> seen a similar debate rage during that entire time about kernel memory
> management. overall both of these tend to be conflicts between short-term
> and long-term benifits.
>
> in the short-term the application user wants to be able to override the
> system to get the best performance _now_
>
> in the long run the system designers don't trust the application
> programmers to get the hints right and want to figure out the right
> optimizer plan, even if it takes a lot longer to do so.
>
> the key to this balance seems to be to work towards as few controls as
> possible, becouse the user will get them wrong far more frequently then
> they get them right, but when you hit a point where there's absolutly no
> way for the system to figure things out (and it's a drastic difference)
> provide the application with a way to hint to the system that things are
> unusueal, but always keep looking for patterns that will let the system
> detect the need itself
>
> even the existing defaults are wrong as frequently as they are right (they
> were set when hardware was very different then it is today) so some way to
> gather real-world stats and set the system defaults based on actual
> hardware performance is really the right way to go (even for things like
> sequential scan speed that are set in the config file today)
>
> David Lang
>

there was discussion on this and IIRC the consensus was that could be
useful tu give some statistics to user defined functions... i don't if
someone is working on this or even if it is doable...

--
regards,
Jaime Casanova
(DBA: DataBase Aniquilator ;)

pgsql-performance by date:

Previous
From: Jaime Casanova
Date:
Subject: Re: Overriding the optimizer
Next
From: David Lang
Date:
Subject: Re: Overriding the optimizer