Thread: Transient plans versus the SPI API
I've been thinking about how to redesign the plancache infrastructure to better support use of transient (one-shot) plans, as we've talked about various times such as in this thread: http://archives.postgresql.org/pgsql-hackers/2010-02/msg00607.php (Note: that thread sorta went off into the weeds arguing about exactly what heuristics to use for when to re-plan. I don't want to re-open that issue today, since there's no way to experiment with policy until we have some mechanism in place.) I think that what we need to do is get rid of the assumption that a "cached plan" normally includes a plan per se. The initial creation of the cache entry should just supply a raw query plus its analyzed-and-rewritten form. (plancache.c can actually operate that way today, via its "not fully planned" flag, but it's a wart rather than the normal philosophy.) Then RevalidateCachedPlan should be replaced by something with the semantics of "get me a plan to use, and here's the parameter values I'm going to use it with". The choice between using a pre-cached generic plan and building a one-off plan would then be localized in this new function. There are not that many places that call plancache.c directly, and so this change in API won't cause much code churn --- but one place that does depend on this is spi.c, and there is *lots* of both core and third-party code that calls SPI_prepare for example. So we need to tread carefully in redefining SPI's behavior. The most straightforward way to reimplement things within spi.c would be to redefine SPI_prepare as just doing the parse-and-rewrite steps, with planning always postponed to SPI_execute. In the case where you just prepare and then execute a SPIPlan, this would come out the same or better, since we'd still just do one planning cycle, but the planner could be given the actual parameter values to use. However, if you SPI_prepare, SPI_saveplan, and then SPI_execute many times, you might come out behind. This is of course the same tradeoff we are going to impose at the SQL level anyway, but I wonder whether there needs to be a control knob available to C code to retain the old plan-once-and-always-use-that-plan approach. Anyone have an opinion about that? And if we do need to expose some control, should the default (if you don't change your source code) be that you still get the old behavior, or that you get the new behavior? I'm inclined to think that if we believe this'll be a win at the SQL level, it should be a win at the SPI-caller level too, but maybe someone thinks otherwise. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > The most straightforward way to reimplement things within spi.c > would be to redefine SPI_prepare as just doing the > parse-and-rewrite steps, with planning always postponed to > SPI_execute. In the case where you just prepare and then execute > a SPIPlan, this would come out the same or better, since we'd > still just do one planning cycle, but the planner could be given > the actual parameter values to use. However, if you SPI_prepare, > SPI_saveplan, and then SPI_execute many times, you might come out > behind. This is of course the same tradeoff we are going to > impose at the SQL level anyway, but I wonder whether there needs > to be a control knob available to C code to retain the old > plan-once-and-always-use-that-plan approach. > > Anyone have an opinion about that? I have a few places I've used SPI_saveplan where there is really only one sensible plan, so I'm pretty sure it would be a loss to use the new technique in those places. Now, whether that would be a loss that would be big enough for anyone to notice (or even to reliably measure) is another question. It wouldn't surprise me if the difference was insignificant, but it would be reassuring to have an easy way to check.... -Kevin
On Tue, Aug 2, 2011 at 4:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I've been thinking about how to redesign the plancache infrastructure to > better support use of transient (one-shot) plans, as we've talked about > various times such as in this thread: > http://archives.postgresql.org/pgsql-hackers/2010-02/msg00607.php > (Note: that thread sorta went off into the weeds arguing about exactly > what heuristics to use for when to re-plan. I don't want to re-open that > issue today, since there's no way to experiment with policy until we have > some mechanism in place.) > > I think that what we need to do is get rid of the assumption that a "cached > plan" normally includes a plan per se. The initial creation of the cache > entry should just supply a raw query plus its analyzed-and-rewritten form. > (plancache.c can actually operate that way today, via its "not fully > planned" flag, but it's a wart rather than the normal philosophy.) Then > RevalidateCachedPlan should be replaced by something with the semantics > of "get me a plan to use, and here's the parameter values I'm going to use > it with". The choice between using a pre-cached generic plan and building > a one-off plan would then be localized in this new function. This seems like a good design. Now what would be really cool is if you could observe a stream of queries like this: SELECT a, b FROM foo WHERE c = 123 SELECT a, b FROM foo WHERE c = 97 SELECT a, b FROM foo WHERE c = 236 ...and say, hey, I could just make a generic plan and use it every time I see one of these. It's not too clear to me how you'd make recognition of such queries cheap enough to be practical, but maybe someone will think of a way... > There are not that many places that call plancache.c directly, and so this > change in API won't cause much code churn --- but one place that does > depend on this is spi.c, and there is *lots* of both core and third-party > code that calls SPI_prepare for example. So we need to tread carefully in > redefining SPI's behavior. > > The most straightforward way to reimplement things within spi.c would be > to redefine SPI_prepare as just doing the parse-and-rewrite steps, with > planning always postponed to SPI_execute. In the case where you just > prepare and then execute a SPIPlan, this would come out the same or > better, since we'd still just do one planning cycle, but the planner could > be given the actual parameter values to use. However, if you SPI_prepare, > SPI_saveplan, and then SPI_execute many times, you might come out behind. > This is of course the same tradeoff we are going to impose at the SQL level > anyway, but I wonder whether there needs to be a control knob available to > C code to retain the old plan-once-and-always-use-that-plan approach. > > Anyone have an opinion about that? And if we do need to expose some > control, should the default (if you don't change your source code) be that > you still get the old behavior, or that you get the new behavior? I'm > inclined to think that if we believe this'll be a win at the SQL level, > it should be a win at the SPI-caller level too, but maybe someone thinks > otherwise. I am not sure about this one. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Aug 2, 2011 at 9:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > The most straightforward way to reimplement things within spi.c would be > to redefine SPI_prepare as just doing the parse-and-rewrite steps, with > planning always postponed to SPI_execute. In the case where you just > prepare and then execute a SPIPlan, this would come out the same or > better, since we'd still just do one planning cycle, but the planner could > be given the actual parameter values to use. However, if you SPI_prepare, > SPI_saveplan, and then SPI_execute many times, you might come out behind. > This is of course the same tradeoff we are going to impose at the SQL level > anyway, but I wonder whether there needs to be a control knob available to > C code to retain the old plan-once-and-always-use-that-plan approach. The problems only occur <1% of the time, so this penalises everyone to avoid real but rare problems. This will cause a massive loss of performance in most apps, though I understand the annoyance and why you make the suggestion. http://www.db2ude.com/?q=node/73 for some more background on how this is handled elsewhere Control knob == hint, so I've avoided suggesting such an approach myself. I think its possible to tell automatically whether we need to replan always or not based upon the path we take through selectivity functions. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Tom Lane <tgl@sss.pgh.pa.us> writes: > Anyone have an opinion about that? I still have this application where PREPARE takes between 50ms and 300ms and EXECUTE 5ms to 10ms, and I can handle 1 PREPARE for 10000 EXECUTE quite easily. (Yes the database fits in RAM, and yes when that's no longer the case we just upgrade the hardware) What does your proposal mean for such a use case? Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> Anyone have an opinion about that? > I still have this application where PREPARE takes between 50ms and 300ms > and EXECUTE 5ms to 10ms, and I can handle 1 PREPARE for 10000 EXECUTE > quite easily. (Yes the database fits in RAM, and yes when that's no > longer the case we just upgrade the hardware) > What does your proposal mean for such a use case? Well, the policy for when to replan or not remains to be worked out in detail, but what is likely to happen for such cases is that we'll waste a few planning cycles before determining that there's no benefit in a custom plan. So, using the worst-case ends of your ranges above and assuming that "a few" means "10", we'd go from 300 + 5 * 10000 = 50300 ms to execute the query 10000 times, to 10 * 300 + 5 * 10000 = 53000 ms. So yes, it'd get a little worse for that use-case. But you have to weigh that against the likelihood that other use-cases will get better. If our requirement for a transient-plan mechanism is that no individual case can ever be worse than before, then we might as well abandon the entire project right now, because the only way to meet that requirement is to change nothing. Of course we could address the worst cases by providing some mechanism to tell the plancache code "always use a generic plan for this query" or "always use a custom plan". I'm not entirely thrilled with that, because it's effectively a planner hint and has got the same problems as all planner hints, namely that users are likely to get it wrong. But it would be relatively painless to supply such a hint at the SPI level, which is why I asked whether we should. It'd be much harder to do something equivalent at higher levels, which is why I'm not that eager to do it for SPI. regards, tom lane
On Wed, Aug 3, 2011 at 12:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes: >> Tom Lane <tgl@sss.pgh.pa.us> writes: >>> Anyone have an opinion about that? > >> I still have this application where PREPARE takes between 50ms and 300ms >> and EXECUTE 5ms to 10ms, and I can handle 1 PREPARE for 10000 EXECUTE >> quite easily. (Yes the database fits in RAM, and yes when that's no >> longer the case we just upgrade the hardware) > >> What does your proposal mean for such a use case? > > Well, the policy for when to replan or not remains to be worked out in > detail, but what is likely to happen for such cases is that we'll waste > a few planning cycles before determining that there's no benefit in a > custom plan. So, using the worst-case ends of your ranges above and > assuming that "a few" means "10", we'd go from 300 + 5 * 10000 = 50300 > ms to execute the query 10000 times, to 10 * 300 + 5 * 10000 = 53000 ms. A little OT here, but (as I think Simon said elsewhere) I think we really ought to be considering the table statistics when deciding whether or not to replan. It seems to me that the overwhelmingly common case where this is going to come up is when (some subset of) the MCVs require a different plan than run-of-the-mill values. It would be nice to somehow work that out. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Tom Lane <tgl@sss.pgh.pa.us> writes: > So yes, it'd get a little worse for that use-case. But you have to > weigh that against the likelihood that other use-cases will get better. > If our requirement for a transient-plan mechanism is that no individual > case can ever be worse than before, then we might as well abandon the > entire project right now, because the only way to meet that requirement > is to change nothing. That is not were I wanted to drift. It's just that I don't have as much time as I would like to those days, and so it helps me a lot seeing a worked out example rather than make sure I parse your proposal correctly. Thanks a lot for your answer, I have a very clear confirmation on how I read your previous email. I will have to do some testing, but it could well be that this application will benefit from locking reductions enough that it buys this effect back. > Of course we could address the worst cases by providing some mechanism > to tell the plancache code "always use a generic plan for this query" > or "always use a custom plan". I'm not entirely thrilled with that, > because it's effectively a planner hint and has got the same problems > as all planner hints, namely that users are likely to get it wrong. Yeah. > But it would be relatively painless to supply such a hint at the SPI > level, which is why I asked whether we should. It'd be much harder to > do something equivalent at higher levels, which is why I'm not that > eager to do it for SPI. Given the SLA of those prepared queries in my case, I think I could accept to have to switch from SQL statements to C coded SRF to guarantee the planning behavior. It will not make the upgrade cheaper, but I realize it's a very narrow and specific use case. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Robert Haas <robertmhaas@gmail.com> writes: > This seems like a good design. Now what would be really cool is if > you could observe a stream of queries like this: > SELECT a, b FROM foo WHERE c = 123 > SELECT a, b FROM foo WHERE c = 97 > SELECT a, b FROM foo WHERE c = 236 > ...and say, hey, I could just make a generic plan and use it every > time I see one of these. It's not too clear to me how you'd make > recognition of such queries cheap enough to be practical, but maybe > someone will think of a way... Hm, you mean reverse-engineering the parameterization of the query? Interesting thought, but I really don't see a way to make it practical. In any case, it would amount to making up for a bad decision on the application side, ie, not transmitting the query in the parameterized form that presumably exists somewhere in the application. I think we'd be better served all around by encouraging app developers to rely more heavily on parameterized queries ... but first we have to fix the performance risks there. regards, tom lane
On Wed, Aug 3, 2011 at 3:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> This seems like a good design. Now what would be really cool is if >> you could observe a stream of queries like this: > >> SELECT a, b FROM foo WHERE c = 123 >> SELECT a, b FROM foo WHERE c = 97 >> SELECT a, b FROM foo WHERE c = 236 > >> ...and say, hey, I could just make a generic plan and use it every >> time I see one of these. It's not too clear to me how you'd make >> recognition of such queries cheap enough to be practical, but maybe >> someone will think of a way... > > Hm, you mean reverse-engineering the parameterization of the query? > Interesting thought, but I really don't see a way to make it practical. > > In any case, it would amount to making up for a bad decision on the > application side, ie, not transmitting the query in the parameterized > form that presumably exists somewhere in the application. I think > we'd be better served all around by encouraging app developers to rely > more heavily on parameterized queries ... but first we have to fix the > performance risks there. Fair enough. I have to admit I'm afraid of them right now. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Simon Riggs <simon@2ndQuadrant.com> writes: > I think its possible to tell automatically whether we need to replan > always or not based upon the path we take through selectivity > functions. I don't really believe that, or at least I think it would only detect a few cases. Examples of parameter-value-sensitive decisions that are made nowhere near the selectivity functions are constraint exclusion and LIKE pattern to index-qual conversion. And in none of these cases do we really know at the bottom level whether a different parameter value will lead to a significant change in the finished plan. For instance, if there's no index for column foo, it is a waste of time to force replanning just because we have varying selectivity estimates for "WHERE foo > $1". I think we'll be a lot better off with the framework discussed last year: build a generic plan, as well as custom plans for the first few sets of parameter values, and then observe whether there's a significant reduction in estimated costs for the custom plans. But in any case, it's way premature to be debating this until we have the infrastructure in which we can experiment with different policies. regards, tom lane
On Wed, Aug 3, 2011 at 8:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: >> I think its possible to tell automatically whether we need to replan >> always or not based upon the path we take through selectivity >> functions. > > I don't really believe that, or at least I think it would only detect a > few cases. Examples of parameter-value-sensitive decisions that are > made nowhere near the selectivity functions are constraint exclusion and > LIKE pattern to index-qual conversion. And in none of these cases do we > really know at the bottom level whether a different parameter value will > lead to a significant change in the finished plan. For instance, if > there's no index for column foo, it is a waste of time to force > replanning just because we have varying selectivity estimates for > "WHERE foo > $1". > > I think we'll be a lot better off with the framework discussed last > year: build a generic plan, as well as custom plans for the first few > sets of parameter values, and then observe whether there's a significant > reduction in estimated costs for the custom plans. The problem there is which executions we build custom plans for. That turns the problem into a sampling issue and you'll only fix the problems that occur with a frequency to match your sampling pattern and rate. Examples of situations where it won't help. * plans that vary by table size will be about the same in the first 5 executions. After large number of executions, things go bad. * text search using parameter is provided by user input - sensible requests have low selectivities; some users put in <space> or "e" and then we try to retrieve whole table by index scan. Almost impossible to prevent all potentially high selectivity inputs from user. We could add LIMIT but frequently ORM generated queries do not do that. This isn't my-way-or-your-way - I think we need to look at some form of "safety barriers" so we generate a plan but also know when the plan has outlived its usefulness and force a re-plan. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2011-08-03 21:19, Tom Lane wrote: > Robert Haas<robertmhaas@gmail.com> writes: >> This seems like a good design. Now what would be really cool is if >> you could observe a stream of queries like this: >> SELECT a, b FROM foo WHERE c = 123 >> SELECT a, b FROM foo WHERE c = 97 >> SELECT a, b FROM foo WHERE c = 236 >> ...and say, hey, I could just make a generic plan and use it every >> time I see one of these. It's not too clear to me how you'd make >> recognition of such queries cheap enough to be practical, but maybe >> someone will think of a way... > Hm, you mean reverse-engineering the parameterization of the query? > Interesting thought, but I really don't see a way to make it practical. See also http://archives.postgresql.org/pgsql-hackers/2010-11/msg00617.php I don't know if any implementation can be practical - maybe the parser could be coerced into emitting some kind of number that's based on everything in the query, except constants (and whitespace), so it would be the same for all the queries Robert described. That could be low cost enough to detect of for a query's id a cached plan exists and do more work only in those cases. -- Yeb Havinga http://www.mgrid.net/ Mastering Medical Data
On tis, 2011-08-02 at 16:47 -0400, Tom Lane wrote: > The most straightforward way to reimplement things within spi.c would > be to redefine SPI_prepare as just doing the parse-and-rewrite steps, > with planning always postponed to SPI_execute. In the case where you > just prepare and then execute a SPIPlan, this would come out the same > or better, since we'd still just do one planning cycle, but the > planner could be given the actual parameter values to use. However, > if you SPI_prepare, SPI_saveplan, and then SPI_execute many times, you > might come out behind. This is of course the same tradeoff we are > going to impose at the SQL level anyway, but I wonder whether there > needs to be a control knob available to C code to retain the old > plan-once-and-always-use-that-plan approach. How about a new function like SPI_parse that has the new semantics? Note that the SPI functions are more or less directly exposed in PL/Perl and PL/Python, and there are a number of existing idioms there that make use of prepared plans. Changing the semantics of those functions might upset a lot of code.
Peter Eisentraut <peter_e@gmx.net> writes: > How about a new function like SPI_parse that has the new semantics? Yeah, I'd considered that idea (and even exactly that name for it). Howver, the disadvantage of inventing a separate entry point is that it isn't going to be nice for multi-level call chains, of which there are several inside the core code and probably plenty elsewhere. The bottom level would have to do something like if (new-behavior-wanted) SPI_parse(args...);else SPI_prepare(args...); and then invent some way for its callers to signal new-behavior-wanted, and it won't be pretty if they all pick different ways to do that. Plus we've already got SPI_prepare_cursor and SPI_prepare_params, each of which would need a matching SPI_parse_foo entry point. So if we want a knob here, I think that the sanest way to install it is to add a couple more flag bits to the existing "int cursorOptions" bitmask arguments of the latter two functions, perhaps CURSOR_OPT_USE_GENERIC_PLANCURSOR_OPT_USE_CUSTOM_PLAN to force generic-plan-always or custom-plan-always respectively. (The "cursor" naming of those flag bits is starting to look a bit unfortunate, but I'm not inclined to rename them now.) If we set it up like that, then the default behavior with flags == 0 would be to use the heuristic plan-selection approach, and presumably that is what you would also get from SPI_prepare (which is both coded and documented as matching SPI_prepare_cursor with flags == 0). So the question is whether it's okay to change the default behavior... > Note that the SPI functions are more or less directly exposed in PL/Perl > and PL/Python, and there are a number of existing idioms there that make > use of prepared plans. Changing the semantics of those functions might > upset a lot of code. Right, but by the same token, if we don't change the default behavior, there is going to be a heck of a lot of code requiring manual adjustment before it can make use of the (hoped-to-be) improvements. To me it makes more sense to change the default and then provide ways for people to lock down the behavior if the heuristic doesn't work for them. regards, tom lane
On Tue, 2011-08-02 at 16:47 -0400, Tom Lane wrote: > The most straightforward way to reimplement things within spi.c would be > to redefine SPI_prepare as just doing the parse-and-rewrite steps, with > planning always postponed to SPI_execute. In the case where you just > prepare and then execute a SPIPlan, this would come out the same or > better, since we'd still just do one planning cycle, but the planner could > be given the actual parameter values to use. However, if you SPI_prepare, > SPI_saveplan, and then SPI_execute many times, you might come out behind. > This is of course the same tradeoff we are going to impose at the SQL level > anyway, but I wonder whether there needs to be a control knob available to > C code to retain the old plan-once-and-always-use-that-plan approach. Would there ultimately be a difference between the way SPI_prepare and PQprepare work? It seems like the needs would be about the same, so I think we should be consistent. Also, I assume that SPI_execute and PQexecParams would always force a custom plan, just like always, right? A control knob sounds limited. For instance, what if the application knows that some parameters will be constant over the time that the plan is saved? It would be nice to be able to bind some parameters to come up with a generic (but less generic) plan, and then execute it many times. Right now that can only be done by inlining such constants in the SQL, which is what we want to avoid. I'm a little bothered by "prepare" sometimes planning and sometimes not (and, by implication, "execute_plan" sometimes planning and sometimes not). It seems cleaner to just separate the steps into parse+rewrite, bind parameters, plan (with whatever parameters are present, giving a more generic plan when some aren't specified), and execute (which would require you to specify any parameters not bound yet). Maybe we don't need to expose all of those steps (although maybe we do), but it would be nice if the API we do offer resembles those steps. Regards,Jeff Davis
On Wed, 2011-08-03 at 12:19 -0400, Tom Lane wrote: > Of course we could address the worst cases by providing some mechanism > to tell the plancache code "always use a generic plan for this query" > or "always use a custom plan". I'm not entirely thrilled with that, > because it's effectively a planner hint and has got the same problems > as all planner hints, namely that users are likely to get it wrong. I'm not entirely convinced by that. It's fairly challenging for a human to choose a good plan for a moderately complex SQL query, and its much more likely that the plan will become a bad one over time. But, in many cases, a developer knows if they simply don't care about planning time, and are willing to always replan. Also, we have a fairly reasonable model for planning SQL queries, but I'm not sure that the model for determining whether to replan a SQL query is quite as clear. Simon brought up some useful points along these lines. Regards,Jeff Davis
On Wed, 2011-08-03 at 13:07 -0400, Robert Haas wrote: > A little OT here, but (as I think Simon said elsewhere) I think we > really ought to be considering the table statistics when deciding > whether or not to replan. It seems to me that the overwhelmingly > common case where this is going to come up is when (some subset of) > the MCVs require a different plan than run-of-the-mill values. It > would be nice to somehow work that out. That blurs the line a little bit. It sounds like this might be described as "incremental planning", and perhaps that's a good way to think about it. Regards,Jeff Davis
Tom Lane <tgl@sss.pgh.pa.us> writes: > I think we'll be a lot better off with the framework discussed last > year: build a generic plan, as well as custom plans for the first few > sets of parameter values, and then observe whether there's a significant > reduction in estimated costs for the custom plans. Another way here would be to cache more than a single plan and to keep execution time samples or some other relevant runtime characteristics. Then what we need would be a way to switch from a plan to another at run time on some conditions, like realizing that the reason why the planner thought a nestloop would be perfect is obviously wrong, or maybe just based on runtime characteristics. > But in any case, it's way premature to be debating this until we have > the infrastructure in which we can experiment with different policies. That too. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Jeff Davis <pgsql@j-davis.com> writes: > A control knob sounds limited. For instance, what if the application > knows that some parameters will be constant over the time that the plan > is saved? It would be nice to be able to bind some parameters to come up > with a generic (but less generic) plan, and then execute it many times. > Right now that can only be done by inlining such constants in the SQL, > which is what we want to avoid. +1 I was already thinking in those term at the application level for the example I've been using before in this thread, and only reading your mail I realize that maybe the backend should be able to do that itself. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Wed, 2011-08-03 at 15:19 -0400, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > This seems like a good design. Now what would be really cool is if > > you could observe a stream of queries like this: > > > SELECT a, b FROM foo WHERE c = 123 > > SELECT a, b FROM foo WHERE c = 97 > > SELECT a, b FROM foo WHERE c = 236 > > > ...and say, hey, I could just make a generic plan and use it every > > time I see one of these. It's not too clear to me how you'd make > > recognition of such queries cheap enough to be practical, but maybe > > someone will think of a way... > > Hm, you mean reverse-engineering the parameterization of the query? Yes, basically re-generate the query after (or while) parsing, replacing constants and arguments with another set of generated arguments and printing the list of these arguments at the end. It may be easiest to do This in parallel with parsing. > Interesting thought, but I really don't see a way to make it practical. Another place where this could be really useful is logging & monitoring If there were an option to log the above queries as "SELECT a, b FROM foo WHERE c = $1", (123) "SELECT a, b FROM foo WHERE c = $1", (97) "SELECT a, b FROM foo WHERE c = $1", (236) it would make all kinds of general performance monitoring tasks also much easier, not to mention that this forw would actually be something that kan be cached internally. For some users this might even be worth to use this feature alone, without it providing Repeating Plan Recognition. > In any case, it would amount to making up for a bad decision on the > application side, ie, not transmitting the query in the parameterized > form that presumably exists somewhere in the application. I think > we'd be better served all around by encouraging app developers to rely > more heavily on parameterized queries ... but first we have to fix the > performance risks there. > > regards, tom lane >
On Sun, 2011-08-07 at 11:15 +0200, Hannu Krosing wrote: > On Wed, 2011-08-03 at 15:19 -0400, Tom Lane wrote: > > Hm, you mean reverse-engineering the parameterization of the query? > > Yes, basically re-generate the query after (or while) parsing, replacing > constants and arguments with another set of generated arguments and > printing the list of these arguments at the end. It may be easiest to do > This in parallel with parsing. > > > Interesting thought, but I really don't see a way to make it practical. > > Another place where this could be really useful is logging & monitoring > > If there were an option to log the above queries as > > "SELECT a, b FROM foo WHERE c = $1", (123) > "SELECT a, b FROM foo WHERE c = $1", (97) > "SELECT a, b FROM foo WHERE c = $1", (236) The main monitoring use_case would be pg_stat_statements, http://developer.postgresql.org/pgdocs/postgres/pgstatstatements.html which is currently pretty useless for queries without parameters > it would make all kinds of general performance monitoring tasks also > much easier, not to mention that this forw would actually be something > that kan be cached internally. > > For some users this might even be worth to use this feature alone, > without it providing Repeating Plan Recognition. > > > In any case, it would amount to making up for a bad decision on the > > application side, ie, not transmitting the query in the parameterized > > form that presumably exists somewhere in the application. I think > > we'd be better served all around by encouraging app developers to rely > > more heavily on parameterized queries ... but first we have to fix the > > performance risks there. > > > > regards, tom lane > > > > >
On Sat, Aug 6, 2011 at 7:29 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> I think we'll be a lot better off with the framework discussed last >> year: build a generic plan, as well as custom plans for the first few >> sets of parameter values, and then observe whether there's a significant >> reduction in estimated costs for the custom plans. > > Another way here would be to cache more than a single plan and to keep > execution time samples or some other relevant runtime characteristics. > Then what we need would be a way to switch from a plan to another at run > time on some conditions, like realizing that the reason why the planner > thought a nestloop would be perfect is obviously wrong, or maybe just > based on runtime characteristics. Tom and I discussed storing multiple sub-plans on a node back in '05 IIRC, and Tom later put in support for that. That wasn't followed up on. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 08/07/2011 12:25 PM, Hannu Krosing wrote: > On Sun, 2011-08-07 at 11:15 +0200, Hannu Krosing wrote: >> On Wed, 2011-08-03 at 15:19 -0400, Tom Lane wrote: >>> Hm, you mean reverse-engineering the parameterization of the query? >> Yes, basically re-generate the query after (or while) parsing, replacing >> constants and arguments with another set of generated arguments and >> printing the list of these arguments at the end. It may be easiest to do >> This in parallel with parsing. >> >>> Interesting thought, but I really don't see a way to make it practical. >> Another place where this could be really useful is logging& monitoring >> >> If there were an option to log the above queries as >> >> "SELECT a, b FROM foo WHERE c = $1", (123) >> "SELECT a, b FROM foo WHERE c = $1", (97) >> "SELECT a, b FROM foo WHERE c = $1", (236) > The main monitoring use_case would be pg_stat_statements, > http://developer.postgresql.org/pgdocs/postgres/pgstatstatements.html > which is currently pretty useless for queries without parameters I was trying to implement something similar for pgpool-II. The user could configure queries for which cached plans are wanted. The configuration would have been a file containing lines in format "SELECT * FROM foo WHERE id = ?". I did not get anything implemented, as there were some problems. The problems were mainly with DEALLOCATE ALL called without pgpool-II knowing it, issues with search_path and the amount of work needed to implement parse tree matching. It would be interesting if pg_stat_statements would be globally available with queries using generic arguments. First, there would be an obvious heuristic for when to cache the plan: If the average runtime of the query is much larger than the average planning time, there is no point in caching the plan. This would also give one option for cache hit estimation. The hit_percent is directly available. On the other hand pg_stat_statements could easily become a choke-point. I would love to work on this, but I lack the needed skills. Maybe I could take another try for writing a proof-of-concept parse tree transformer and matcher, but I doubt I can produce anything useful. - Anssi
On Mon, 2011-08-08 at 11:39 +0300, Anssi Kääriäinen wrote: > On 08/07/2011 12:25 PM, Hannu Krosing wrote: > > On Sun, 2011-08-07 at 11:15 +0200, Hannu Krosing wrote: > >> On Wed, 2011-08-03 at 15:19 -0400, Tom Lane wrote: > >>> Hm, you mean reverse-engineering the parameterization of the query? > >> Yes, basically re-generate the query after (or while) parsing, replacing > >> constants and arguments with another set of generated arguments and > >> printing the list of these arguments at the end. It may be easiest to do > >> This in parallel with parsing. > >> > >>> Interesting thought, but I really don't see a way to make it practical. > >> Another place where this could be really useful is logging& monitoring > >> > >> If there were an option to log the above queries as > >> > >> "SELECT a, b FROM foo WHERE c = $1", (123) > >> "SELECT a, b FROM foo WHERE c = $1", (97) > >> "SELECT a, b FROM foo WHERE c = $1", (236) > > The main monitoring use_case would be pg_stat_statements, > > http://developer.postgresql.org/pgdocs/postgres/pgstatstatements.html > > which is currently pretty useless for queries without parameters > > I was trying to implement something similar for pgpool-II. The user > could configure queries for which cached plans are wanted. The > configuration would have been a file containing lines in format "SELECT > * FROM foo WHERE id = ?". I did not get anything implemented, as there > were some problems. The problems were mainly with DEALLOCATE ALL called > without pgpool-II knowing it, issues with search_path and the amount of > work needed to implement parse tree matching. > > It would be interesting if pg_stat_statements would be globally > available with queries using generic arguments. First, there would be an > obvious heuristic for when to cache the plan: If the average runtime of > the query is much larger than the average planning time, there is no > point in caching the plan. This would also give one option for cache hit > estimation. The hit_percent is directly available. On the other hand > pg_stat_statements could easily become a choke-point. > > I would love to work on this, but I lack the needed skills. Maybe I > could take another try for writing a proof-of-concept parse tree > transformer and matcher, but I doubt I can produce anything useful. That is why I think it is best done in the main parser - it has to parse and analyse the query anyway and likely knows which constants are "arguments" to the query. If doing it outside the main backend parser, it would be best to still use the postgreSQL lex/bison files as much as possible for this. -- ------- Hannu Krosing PostgreSQL Unlimited Scalability and Performance Consultant 2ndQuadrant Nordic PG Admin Book: http://www.2ndQuadrant.com/books/
On 08/08/2011 01:07 PM, Hannu Krosing wrote: > That is why I think it is best done in the main parser - it has to parse > and analyse the query anyway and likely knows which constants are > "arguments" to the query As far as I understand the problem, the parsing must transform table references to schema-qualified references. The table_foobar in "SELECT * FROM table_foobar WHERE id = ?" is not enough to identify a table. Using search_path, query_str as a key is one possibility, but the search_path is likely to be different for each user, and this could result in the same query being cached multiple times. By the way, I checked current Git HEAD and pg_stat_statements seems to not handle search_path correctly. For pg_stat_statements this is not critical, but if the raw query string is used as plan cache key things will obviously break... - Anssi
Hannu Krosing <hannu@krosing.net> writes: >> Hm, you mean reverse-engineering the parameterization of the query? > > Yes, basically re-generate the query after (or while) parsing, replacing > constants and arguments with another set of generated arguments and > printing the list of these arguments at the end. It may be easiest to do > This in parallel with parsing. > >> Interesting thought, but I really don't see a way to make it practical. > > Another place where this could be really useful is logging & monitoring Another big use case is full support for materialized views: we could then optimize a query to automatically use a matview even when written against the “usual” schema. Now matviews are another kind of indexes. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Tom Lane wrote: > > Note that the SPI functions are more or less directly exposed in PL/Perl > > and PL/Python, and there are a number of existing idioms there that make > > use of prepared plans. Changing the semantics of those functions might > > upset a lot of code. > > Right, but by the same token, if we don't change the default behavior, > there is going to be a heck of a lot of code requiring manual adjustment > before it can make use of the (hoped-to-be) improvements. To me it > makes more sense to change the default and then provide ways for people > to lock down the behavior if the heuristic doesn't work for them. Agreed. I think the big sticking point is that without logic on how the replanning will happen, users are having to guess how much impact this new default behavior will have. I also agree that this will harm some uses but improve a larger pool of users. Remember, the people on this email list are probably using this feature in a much more sophisticated way than the average user. Also, there is a TODO idea that the results found by executing the query (e.g. number of rows returned at each stage) could be fed back and affect the replanning of queries. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
[ getting back to the planner finally ] Simon Riggs <simon@2ndQuadrant.com> writes: > On Wed, Aug 3, 2011 at 8:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Simon Riggs <simon@2ndQuadrant.com> writes: >>> I think its possible to tell automatically whether we need to replan >>> always or not based upon the path we take through selectivity >>> functions. >> I don't really believe that, or at least I think it would only detect a >> few cases. > The problem there is which executions we build custom plans for. That > turns the problem into a sampling issue and you'll only fix the > problems that occur with a frequency to match your sampling pattern > and rate. Examples of situations where it won't help. Sure, this is not going to solve every problem we have with the planner. What it is intended to solve is cases where someone is trying to use the prepared-plan mechanisms but he would be a lot better off with parameter-value-specific plans. In particular: > * plans that vary by table size will be about the same in the first 5 > executions. After large number of executions, things go bad. This is a red herring. The plancache code already arranges to replan every time the relevant table stats are updated by autovacuum, which should certainly happen from time to time if the table's contents are changing materially. If you're thinking in terms of plans being "stale" then you're worrying about a long-since-solved problem. regards, tom lane
On Fri, Aug 19, 2011 at 6:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > [ getting back to the planner finally ] > > Simon Riggs <simon@2ndQuadrant.com> writes: >> On Wed, Aug 3, 2011 at 8:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Simon Riggs <simon@2ndQuadrant.com> writes: >>>> I think its possible to tell automatically whether we need to replan >>>> always or not based upon the path we take through selectivity >>>> functions. > >>> I don't really believe that, or at least I think it would only detect a >>> few cases. > >> The problem there is which executions we build custom plans for. That >> turns the problem into a sampling issue and you'll only fix the >> problems that occur with a frequency to match your sampling pattern >> and rate. Examples of situations where it won't help. > > Sure, this is not going to solve every problem we have with the > planner. What it is intended to solve is cases where someone is trying > to use the prepared-plan mechanisms but he would be a lot better off > with parameter-value-specific plans. In particular: I just realised this is exactly the same strategy as the no-longer-used JDBC parameter prepareThreshold. If we treat this in a similar way. prepare_threshold currently = 0 and you are suggesting we move the value to 5. OK. Will this be an actual parameter? If so, it removes my objection because I can turn it off. What would be even better would be some other controls, like a plugin that allows us to control the mechanism or at least experiment with it. Maybe we can assemble enough evidence to remove it before release. I've been arguing it won't solve all problems. It won't. But if it solves some, so its worth having. On another point, I'd still like a "one-shot plan" flag, so that we can act on that knowledge and have various pieces of code take decisions that override the plan cache. i.e. if the plan screws up during execution we can mark the plan as a one shot so it isn't reused. >> * plans that vary by table size will be about the same in the first 5 >> executions. After large number of executions, things go bad. > > This is a red herring. The plancache code already arranges to replan > every time the relevant table stats are updated by autovacuum, which > should certainly happen from time to time if the table's contents are > changing materially. If you're thinking in terms of plans being "stale" > then you're worrying about a long-since-solved problem. Fair enough. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services