Thread: Re: [COMMITTERS] pgsql: Add missing optimizer hooks for function cost and number of rows.

sriggs@postgresql.org (Simon Riggs) writes:
> Log Message:
> -----------
> Add missing optimizer hooks for function cost and number of rows.
> Closely follow design of other optimizer hooks: if hook exists
> retrieve value from plugin; if still not set then get from cache.

What exactly are we doing adding new features without discussion (or
documentation, or known use cases) at this stage of the release cycle?

            regards, tom lane

On Fri, Apr 23, 2010 at 6:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> sriggs@postgresql.org (Simon Riggs) writes:
>> Log Message:
>> -----------
>> Add missing optimizer hooks for function cost and number of rows.
>> Closely follow design of other optimizer hooks: if hook exists
>> retrieve value from plugin; if still not set then get from cache.
>
> What exactly are we doing adding new features without discussion (or
> documentation, or known use cases) at this stage of the release cycle?

I'm confused, too.  It seems like there have been a LOT of patches
this week that were not posted to or discussed on -hackers.  I thought
that was not within the ground rules.

...Robert

On Fri, 2010-04-23 at 18:55 -0400, Tom Lane wrote:
> sriggs@postgresql.org (Simon Riggs) writes:
> > Log Message:
> > -----------
> > Add missing optimizer hooks for function cost and number of rows.
> > Closely follow design of other optimizer hooks: if hook exists
> > retrieve value from plugin; if still not set then get from cache.
>
> What exactly are we doing adding new features without discussion (or
> documentation, or known use cases) at this stage of the release cycle?

Existing hooks were not fully complete in their coverage. That has
happened before, and we have discussed that before on hackers, so I took
care not to deviate from that implementation. This is a very low impact
change, isn't in a new area and similar optimizer related changes were
made recently, so I saw little to object to in this particular change.
No such hooks are documented, even ones with strong use cases.

--
 Simon Riggs           www.2ndQuadrant.com


On Sat, Apr 24, 2010 at 3:31 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Fri, 2010-04-23 at 18:55 -0400, Tom Lane wrote:
>> sriggs@postgresql.org (Simon Riggs) writes:
>> > Log Message:
>> > -----------
>> > Add missing optimizer hooks for function cost and number of rows.
>> > Closely follow design of other optimizer hooks: if hook exists
>> > retrieve value from plugin; if still not set then get from cache.
>>
>> What exactly are we doing adding new features without discussion (or
>> documentation, or known use cases) at this stage of the release cycle?
>
> Existing hooks were not fully complete in their coverage. That has
> happened before, and we have discussed that before on hackers, so I took
> care not to deviate from that implementation. This is a very low impact
> change, isn't in a new area and similar optimizer related changes were
> made recently, so I saw little to object to in this particular change.
> No such hooks are documented, even ones with strong use cases.

The point isn't whether the existing hooks are complete or not.  The
point is that we shouldn't be making undiscussed changes EVER, and
particularly not a week before beta.  Hooks are frequently proposed
and rejected - every once in a while they are proposed and accepted.
So it is not as if there is any reason to believe that no one could
possibly object to this.  And you carefully failed to answer Tom's
other point about lack of use case.  I think the use case for these
hooks is pretty thin, but I don't really want to argue about it now.
I want you to revert the patch and resubmit it for 9.1 when there is
time to properly discuss it, and focus on the remaining open items so
that we can put out a beta.

...Robert

Robert Haas <robertmhaas@gmail.com> writes:
> On Sat, Apr 24, 2010 at 3:31 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> Existing hooks were not fully complete in their coverage. That has
>> happened before, and we have discussed that before on hackers, so I took
>> care not to deviate from that implementation. This is a very low impact
>> change, isn't in a new area and similar optimizer related changes were
>> made recently, so I saw little to object to in this particular change.
>> No such hooks are documented, even ones with strong use cases.

> The point isn't whether the existing hooks are complete or not.  The
> point is that we shouldn't be making undiscussed changes EVER, and
> particularly not a week before beta.

I have a problem with not only the process (or lack of it) but the
substance of the patch.  I don't believe that a system-wide hook point
has any great use for improving function estimation.  You need function-
specific knowledge, and this is just not a useful way to package it.

When we put in the COST/ROWS options for functions, it was generally
agreed that the way forward would be to generalize those, eg by allowing
per-function estimator functions to be called instead of just inserting
constants.  (And I think the main reason we didn't just do that
immediately was that we wanted a feature that could be used without
doing C-level programming.)  This patch doesn't do that, nor even lay
any useful groundwork for doing it.  It would be impossible for instance
for this hook function to lay its hands on the arguments to the function
to be estimated, which certainly begs the question as to how it's going
to deliver any estimate more useful than the constant value.

Please revert.

            regards, tom lane

On Sat, 2010-04-24 at 11:17 -0400, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Sat, Apr 24, 2010 at 3:31 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> >> Existing hooks were not fully complete in their coverage. That has
> >> happened before, and we have discussed that before on hackers, so I took
> >> care not to deviate from that implementation. This is a very low impact
> >> change, isn't in a new area and similar optimizer related changes were
> >> made recently, so I saw little to object to in this particular change.
> >> No such hooks are documented, even ones with strong use cases.
>
> > The point isn't whether the existing hooks are complete or not.  The
> > point is that we shouldn't be making undiscussed changes EVER, and
> > particularly not a week before beta.
>
> I have a problem with not only the process (or lack of it) but the
> substance of the patch.  I don't believe that a system-wide hook point
> has any great use for improving function estimation.  You need function-
> specific knowledge, and this is just not a useful way to package it.

I've revoked it.

> When we put in the COST/ROWS options for functions, it was generally
> agreed that the way forward would be to generalize those, eg by allowing
> per-function estimator functions to be called instead of just inserting
> constants.

I completely agree that the above is the best way forwards.

>  (And I think the main reason we didn't just do that
> immediately was that we wanted a feature that could be used without
> doing C-level programming.) This patch doesn't do that, nor even lay
> any useful groundwork for doing it.  It would be impossible for instance
> for this hook function to lay its hands on the arguments to the function
> to be estimated, which certainly begs the question as to how it's going
> to deliver any estimate more useful than the constant value.

We can override table stats but not functions stats. I noticed that hole
in the previous implementation, and rectified it with the intention of
helping users with what I saw as a small, low risk patch that exactly
followed existing code. It would seem I did that too quickly without
realising that an objection would occur.

For the record, it's possible to use the main optimizer hooks to get the
same result, so I'm in no way personally blocked by this. I think that's
harder, so others may not find it as easy. This may actually help obtain
funding to implement the full approach, maybe.

--
 Simon Riggs           www.2ndQuadrant.com


Simon Riggs <simon@2ndQuadrant.com> writes:
> We can override table stats but not functions stats. I noticed that hole
> in the previous implementation, and rectified it with the intention of
> helping users with what I saw as a small, low risk patch that exactly
> followed existing code. It would seem I did that too quickly without
> realising that an objection would occur.

Well, you did it without much thought at all.  I think this episode is a
perfect demonstration of why we ask for concrete use-cases for proposed
hooks.  If you'd actually tried to write something that used the hook,
you'd surely have noticed that it wasn't being passed the information
that it would need to do anything useful, and you'd probably have
recognized the problem that there's no good way for a single hook
function to provide an extensible collection of function-specific
knowledge.

But the other point is that people aren't going to want to have to write
C-language hook functions in order to provide estimators for
user-defined functions.  We need to think of something higher-level than
that.  I think there was some discussion of generalizing the COST/ROWS
constants into SQL expressions using the function arguments, which the
planner could evaluate if it could reduce the arguments to constants.
I'm not sure if that would be adequate or even useful, but it seems more
likely to be helpful than a bare hook function.

            regards, tom lane

On Sat, 2010-04-24 at 12:59 -0400, Tom Lane wrote:

> Well, you did it without much thought at all.

To the consequences, definitely.

> I think this episode is a
> perfect demonstration of why we ask for concrete use-cases for proposed
> hooks.  If you'd actually tried to write something that used the hook,
> you'd surely have noticed that it wasn't being passed the information
> that it would need to do anything useful, and you'd probably have
> recognized the problem that there's no good way for a single hook
> function to provide an extensible collection of function-specific
> knowledge.

To the value, no. The limitations of the hook approach were clear, but
they do at least allow overriding values on a session by session basis,
allowing you to write a program to estimate the result and then set the
function costs accordingly. It's not clever or the best way, but it was
the same situation as the other hooks currently provide and I imagined
it would be accepted without question, wrongly.

--
 Simon Riggs           www.2ndQuadrant.com