Re: Delaying the planning of unnamed statements until Bind - Mailing list pgsql-hackers
From | Oliver Jowett |
---|---|
Subject | Re: Delaying the planning of unnamed statements until Bind |
Date | |
Msg-id | 40AEA7B3.4060204@opencloud.com Whole thread Raw |
In response to | Re: Delaying the planning of unnamed statements until Bind (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Delaying the planning of unnamed statements until Bind
|
List | pgsql-hackers |
Tom Lane wrote: > Oliver Jowett <oliver@opencloud.com> writes: > >>Tom Lane wrote: >>>I don't like that at all. Do the planning using the given parameters, >>>but save the plan. Otherwise you have substantially pessimized the >>>behavior for the case where an unnamed statement is reused. > >>How often is the unnamed statement reused? > > Who's to say? If it's not reused then this argument is moot, so let's > assume there are people out there who want to reuse it. The approach you suggest appears to affect the performance of the query even in the case where the unnamed statement is not reused (see below), so I don't think this is moot. > What behavior > will they wish for? The "plan with first set of actual parameters, then > keep using plan" behavior was discussed long ago, and it seems useful to > me to provide it. People who think they are going to have statistically > different parameter values from time to time will of course not want to > use this, but people who expect the same plan to continue to be useful > won't want to pay the overhead of replanning. > > The reason this is important is exactly the one you have already seen, > namely that when faced with unknown Params the planner will sometimes > fall back to very conservative assumptions. An example that comes up > pretty often is > select * from mytable where entry_time >= $1; Ok, I think I understand the argument now. As I understand it we have three cases here: 1) Parameterized query where planning is expensive, and the general parameterized plan with unknown Params provides a "good enough" plan. This is already supported via PREPARE/EXECUTE or Parse/Bind with named statements and would continue to be supported in the same way regardless of what we do. No problems there. 2) Parameterized query where planning is expensive, specific parameter values are needed to produce a good plan, and that plan remains "good enough" for other parameter values because of the particular values used by the client. This is the case you described above. I can see that this behavior could be useful for specific queries. Currently, there's no way to get equivalent behaviour -- this is a new feature. It might be awkward for clients to actually *use*, as they can't interleave different (unnamed) queries without replanning. 3) Parameterized query where specific parameter values are needed to produce a good plan, and the benefit of a good plan outweighs the cost of replanning each time. This can currently be done by substituting parameters on the client side, but this forces the client to forego the benefits of Bind (binary, typed, parameter transfer). So I see it as letting clients make use of an existing feature, not a new feature :) If (3) is not supported via Parse/Bind, a general client interface such as JDBC has trouble using parameterized Parse/Bind by default: you don't know how the query's performance will be affected. For our application, parameterized Parse/Bind is a big win and we'll probably use it regardless, but I don't know if my patches are going to be acceptable for the official JDBC driver if it can reduce performance over the current code which does parameter substitution on the client side. The JDBC driver could have a driver-specific interface to control this, but a) it's postgresql-specific which is bad for portable applications, and b) the JDBC driver will need to maintain two code paths that do essentially the same thing in different ways. Yuck. From my point of view, if (2) and (3) are mutually exclusive then I'd rather have (3). >>>No, you can't replace Params with Consts. > > >>Transforming the tree seemed the most reliable way to get a result >>that's consistent with Const being in the tree from the start (i.e. the >>simple query case), and doing the replacement hasn't broken anything in >>my (admittedly brief) testing yet. > > > This depends on the outcome of the above discussion. If you're not > saving the plan then replacing params with constants is reasonable, > but if you are saving the plan then you can't do that. Ok, I understand: the replaced tree would be incorrect (not just inefficient) if reused for subsequent parameter values, as it can modify the tree based on the concrete Const values e.g. collapsing constant subexpression trees to a single value. This leads to my next problem (which was one of the original reasons I went with node replacement): don't we get different performance between a parameterized query and an equivalent unparameterized query in cases such as this? : SELECT * FROM sometable WHERE field = $1 * 10 The planner can't treat the multiplication expression as constant if $1 is a Param node when planning, even if we have concrete values for that parameter -- so the selectivity estimates go out the window again. This seems nontrivial to fix (you'd need a "constant parameterized expression" node wrapping the expression, or something similar). So with this approach a parameterized query via the unnamed statement may perform worse than an equivalent unparameterized query, even if the unnamed statement is not reused. This is the original problem I was trying to solve in a slightly different guise. >>Would it be safe to save-and-clear the global parameter state only at >>the topmost planner() level as is done for PlannerParamList and friends? > > > Can't see why not. Note that I do not see this state as "global" in the > sense of affecting anything outside the planner. Ok. I was concerned that the selectivity functions and similar places that could use constant values for Param nodes might be called by code conceptually "outside" the planner, but invoked when the planner ran functions or similar. Is there any documentation on which nonstatic functions are general utility functions and which are internal to the planner code? -O
pgsql-hackers by date: