Re: Delaying the planning of unnamed statements until Bind - Mailing list pgsql-hackers

From Oliver Jowett
Subject Re: Delaying the planning of unnamed statements until Bind
Date
Msg-id 40AD3482.6070500@opencloud.com
Whole thread Raw
In response to Re: Delaying the planning of unnamed statements until Bind  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Delaying the planning of unnamed statements until Bind  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom Lane wrote:
> Oliver Jowett <oliver@opencloud.com> writes:
> 
>>Following a recent discussion on the JDBC list 
>>(http://archives.postgresql.org/pgsql-jdbc/2004-05/msg00100.php) I'm 
>>looking at modifying when unnamed statements received via a v3 protocol 
>>Parse message are planned. The idea is to delay planning until the Bind 
>>message arrives, so that the planner benefits from knowing the actual 
>>parameter values in use. This means that clients such as JDBC can use an 
>>unnamed Parse/Bind as a way to provide typed and/or binary parameter 
>>values without running the risk of worsening the query plan.
> 
> 
> I'm a bit concerned about driving such a change in behavior off nothing
> more than whether the statement is named or not.  The protocol spec does
> say this:
> 
> : In addition, an "unnamed" prepared statement and portal exist. Although
> : these behave largely the same as named objects, operations on them are
> : optimized for the case of executing a query only once and then
> : discarding it, whereas operations on named objects are optimized on the
> : expectation of multiple uses.
> 
> so you could argue that this change is just an optimization of that
> kind.  It had better be documented though.

Ideally the when-to-plan decision should be client-controllable per 
statement. I couldn't see a way to do this without causing a protocol 
version change; using the unnamed statement sounded like a reasonable 
compromise.

It'd also make the extended query protocol closer to the simple query 
protocol as the documentation claims. This is the underlying problem 
that bit me: the extended query protocol is *not* just the simple query 
protocol broken down into smaller steps. There are non-obvious changes 
deep in the guts of the system if you use the extra functionality of the 
extended protocol.

Would it be better to bite the bullet and bump the protocol version, 
adding a new field to Parse to control this behaviour?

>>3. In execute_bind_message, if the source statement is not planned 
>>(plan_list == NULL), plan it, passing the concrete parameter values 
>>received in the Bind message.
> 
> 
> I am not sure you can rely on plan_list == NULL to be the driving test;
> consider an empty query string for instance.  It'd be better to add a
> boolean planning_done field to the struct.

Oops -- I made the incorrect assumption that NIL != NULL (i.e. it was a 
shared placeholder node for empty lists).

Looks like this would only bite in the empty-query case (where 
querytree_list is NIL). Trying to plan that via pg_plan_queries looks 
fairly harmless. Nevertheless, I can add a field to the struct.

>>The planning happens using the target 
>>portal's memory context; the resulting plan_list is only used for this 
>>portal and the source statement remains unplanned.
> 
> 
> I don't like that at all.  Do the planning using the given parameters,
> but save the plan.  Otherwise you have substantially pessimized the
> behavior for the case where an unnamed statement is reused.

How often is the unnamed statement reused? From my exhaustive sample of 
one (the JDBC driver) it's never reused. (admittedly that means it 
doesn't affect me either way, but..)

If the planning behaviour was controllable per-statement, would you be 
ok with replanning on every Bind when the statement asks for delaying 
planning?

The reason I don't like storing the plan is that I see nothing special 
about the first set (or Nth set) of parameters if the statement is being 
reused, and you could come up with a plan that's substantially worse 
than if you'd just planned before parameters were available (e.g. 
picking an indexscan over a seqscan because the first Bind had selective 
parameters, then subsequently Binding nonselective parameters).

>>4. In planner(), if parameters are provided, replace Param nodes with 
>>Const nodes in the query tree where appropriate.
> 
> 
> No, you can't replace Params with Consts.  You can use the values of
> Params in the places where selfuncs.c wants to extract constant values
> for estimation purposes.

Transforming the tree seemed the most reliable way to get a result 
that's consistent with Const being in the tree from the start (i.e. the 
simple query case), and doing the replacement hasn't broken anything in 
my (admittedly brief) testing yet. What should I be looking at to see 
the breakage?

Note that the Const vs. Param difference appears to affect 
eval_const_expressions too, so it's more than just the selectivity 
functions that would need changing if we want to produce the same query 
plans as from an equivalent simple query.

>>... it requires threading the parameter values through many 
>>planner functions. There are quite a few (20-30ish?) functions affected.
> 
> 
> This would be a pretty thorough mess, especially after you got done
> propagating the info to selectivity functions.  Frankly I'd suggest
> using a planner global variable instead; see PlannerParamList for a
> model.

If I'm replacing the Param nodes it doesn't need to go as deep as the 
selectivity functions. It's not great but it's not a disaster either. 
Certainly if I needed to propagate all the way to the selectivity 
functions it'd get out of control.

Would it be safe to save-and-clear the global parameter state only at 
the topmost planner() level as is done for PlannerParamList and friends? 
I'm concerned about the global parameter state interfering with the 
parameter values of things like functions evaluated during planning. 
Tracing the opportunities for recursion in the planner is, well, 
interesting..

Thanks for your comments!

-O


pgsql-hackers by date:

Previous
From: "Marc G. Fournier"
Date:
Subject: Re: commit messages from gforge -> pgsql-committers
Next
From: "Marc G. Fournier"
Date:
Subject: Re: commit messages from gforge -> pgsql-committers