Re: Delaying the planning of unnamed statements until Bind - Mailing list pgsql-hackers

From Oliver Jowett
Subject Re: Delaying the planning of unnamed statements until Bind
Date
Msg-id 40AEA7B3.4060204@opencloud.com
Whole thread Raw
In response to Re: Delaying the planning of unnamed statements until Bind  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Delaying the planning of unnamed statements until Bind  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom Lane wrote:
> Oliver Jowett <oliver@opencloud.com> writes:
> 
>>Tom Lane wrote:

>>>I don't like that at all.  Do the planning using the given parameters,
>>>but save the plan.  Otherwise you have substantially pessimized the
>>>behavior for the case where an unnamed statement is reused.
> 
>>How often is the unnamed statement reused?
> 
> Who's to say?  If it's not reused then this argument is moot, so let's
> assume there are people out there who want to reuse it.

The approach you suggest appears to affect the performance of the query 
even in the case where the unnamed statement is not reused (see below), 
so I don't think this is moot.

> What behavior
> will they wish for?  The "plan with first set of actual parameters, then
> keep using plan" behavior was discussed long ago, and it seems useful to
> me to provide it.  People who think they are going to have statistically
> different parameter values from time to time will of course not want to
> use this, but people who expect the same plan to continue to be useful
> won't want to pay the overhead of replanning.
> 
> The reason this is important is exactly the one you have already seen,
> namely that when faced with unknown Params the planner will sometimes
> fall back to very conservative assumptions.  An example that comes up
> pretty often is
>     select * from mytable where entry_time >= $1;

Ok, I think I understand the argument now. As I understand it we have 
three cases here:

1) Parameterized query where planning is expensive, and the general 
parameterized plan with unknown Params provides a "good enough" plan. 
This is already supported via PREPARE/EXECUTE or Parse/Bind with named 
statements and would continue to be supported in the same way regardless 
of what we do. No problems there.

2) Parameterized query where planning is expensive, specific parameter 
values are needed to produce a good plan, and that plan remains "good 
enough" for other parameter values because of the particular values used 
by the client. This is the case you described above. I can see that this 
behavior could be useful for specific queries. Currently, there's no way 
to get equivalent behaviour -- this is a new feature. It might be 
awkward for clients to actually *use*, as they can't interleave 
different (unnamed) queries without replanning.

3) Parameterized query where specific parameter values are needed to 
produce a good plan, and the benefit of a good plan outweighs the cost 
of replanning each time. This can currently be done by substituting 
parameters on the client side, but this forces the client to forego the 
benefits of Bind (binary, typed, parameter transfer). So I see it as 
letting clients make use of an existing feature, not a new feature :)

If (3) is not supported via Parse/Bind, a general client interface such 
as JDBC has trouble using parameterized Parse/Bind by default: you don't 
know how the query's performance will be affected. For our application, 
parameterized Parse/Bind is a big win and we'll probably use it 
regardless, but I don't know if my patches are going to be acceptable 
for the official JDBC driver if it can reduce performance over the 
current code which does parameter substitution on the client side.

The JDBC driver could have a driver-specific interface to control this, 
but a) it's postgresql-specific which is bad for portable applications, 
and b) the JDBC driver will need to maintain two code paths that do 
essentially the same thing in different ways. Yuck.
From my point of view, if (2) and (3) are mutually exclusive then I'd 
rather have (3).

>>>No, you can't replace Params with Consts.
> 
> 
>>Transforming the tree seemed the most reliable way to get a result 
>>that's consistent with Const being in the tree from the start (i.e. the 
>>simple query case), and doing the replacement hasn't broken anything in 
>>my (admittedly brief) testing yet.
> 
> 
> This depends on the outcome of the above discussion.  If you're not
> saving the plan then replacing params with constants is reasonable,
> but if you are saving the plan then you can't do that.

Ok, I understand: the replaced tree would be incorrect (not just 
inefficient) if reused for subsequent parameter values, as it can modify 
the tree based on the concrete Const values e.g. collapsing constant 
subexpression trees to a single value.

This leads to my next problem (which was one of the original reasons I 
went with node replacement): don't we get different performance between 
a parameterized query and an equivalent unparameterized query in cases 
such as this? :
  SELECT * FROM sometable WHERE field = $1 * 10

The planner can't treat the multiplication expression as constant if $1 
is a Param node when planning, even if we have concrete values for that 
parameter -- so the selectivity estimates go out the window again.

This seems nontrivial to fix (you'd need a "constant parameterized 
expression" node wrapping the expression, or something similar).

So with this approach a parameterized query via the unnamed statement 
may perform worse than an equivalent unparameterized query, even if the 
unnamed statement is not reused. This is the original problem I was 
trying to solve in a slightly different guise.

>>Would it be safe to save-and-clear the global parameter state only at 
>>the topmost planner() level as is done for PlannerParamList and friends? 
> 
> 
> Can't see why not.  Note that I do not see this state as "global" in the
> sense of affecting anything outside the planner.

Ok. I was concerned that the selectivity functions and similar places 
that could use constant values for Param nodes might be called by code 
conceptually "outside" the planner, but invoked when the planner ran 
functions or similar. Is there any documentation on which nonstatic 
functions are general utility functions and which are internal to the 
planner code?

-O


pgsql-hackers by date:

Previous
From: "Marc G. Fournier"
Date:
Subject: Re: Downloadable Mailing List Archives
Next
From: Tom Lane
Date:
Subject: Re: Delaying the planning of unnamed statements until Bind