Thread: Automatic function replanning
Hi, there's a topic that comes up from time to time on the lists, the problem that pgsql functions get planned only once and thereafter the same query plan is used until server shutdown or explicit recreation of the function. I'd like to implement a way of automatic function replanning. I can think of two possible approaches. 1. in a more general way: extend pg_proc by an interval column "ttl" or "replanAfter" and add a function declaration attributeto the parser "... STRICT STABLE REPLAN AFTER '3 days'::interval" + general approach, every language that can pre-compute plans can use this feature, the check can be done in one placefor all languages - in fact only plpsql can do that at the moment (right?) and there is no other candidate for something similar at themoment - catalog change that also requires interval to be specially treated while bootstrapping - catalog would grow, every function would have the attribute though it is only applicable for a very low number of functions,let alone the number of functions that would actually use it in a typical installation 2. use the #option feature of plpgsql. Add the possibility to specify #option ttl '3 days' or #option replan-after '1day 2 hours' + Minor changes, changes only local to plpgsql - plpgsql specific solution - is #option documented at all? Should it stay "unofficial"? If so, why? 3. (not automatic) add a statement that makes pgsql forget the plan and compute a new one when the function gets calledagain. "The user should rather use cron for doing maintenance tasks." What do you think? Any other ideas? Joachim
On Tue, 2005-12-13 at 22:32 +0100, Joachim Wieland wrote: > there's a topic that comes up from time to time on the lists, the problem > that pgsql functions get planned only once and thereafter the same query > plan is used until server shutdown or explicit recreation of the function. The problem really has nothing to do with functions, per se: whenever a plan is created and then stored for future use, the assumptions made by that plan may be invalidated by the time the plan is executed. This applies to PREPARE, pl/pgsql functions, perhaps the plan caching done by the RI triggers, and so forth. I also think that invalidating cached plans on a periodic basis is the wrong approach -- we can use sinval to invalidate plans as soon as a dependent database object changes and not before. This thread contains some ideas on how to do this: http://archives.postgresql.org/pgsql-hackers/2005-03/msg00426.php I got somewhat sidetracked by the complexities of the "central plan caching module" that Tom would like to see, but I'm still hoping to take a look at this for 8.2. -Neil
On Tue, Dec 13, 2005 at 04:49:10PM -0500, Neil Conway wrote: > On Tue, 2005-12-13 at 22:32 +0100, Joachim Wieland wrote: > > there's a topic that comes up from time to time on the lists, the problem > > that pgsql functions get planned only once and thereafter the same query > > plan is used until server shutdown or explicit recreation of the function. > > The problem really has nothing to do with functions, per se: whenever a > plan is created and then stored for future use, the assumptions made by > that plan may be invalidated by the time the plan is executed. This > applies to PREPARE, pl/pgsql functions, perhaps the plan caching done by > the RI triggers, and so forth. > > I also think that invalidating cached plans on a periodic basis is the > wrong approach -- we can use sinval to invalidate plans as soon as a > dependent database object changes and not before. This thread contains > some ideas on how to do this: > > http://archives.postgresql.org/pgsql-hackers/2005-03/msg00426.php > > I got somewhat sidetracked by the complexities of the "central plan > caching module" that Tom would like to see, but I'm still hoping to take > a look at this for 8.2. As for predicate-driven plan changes (ie: query is planned the first time with a predicate that has high cardinality, but there are also low cardinality values that will be queried on), it would make more sense to track the amount of work (probably tuples fetched) normally required to execute a prepared statement. Any time that prepared statement is executed with a set of predicates that substantially changes the amount of work required it should be remembered and considered for re-planning the next time the query is executed with those predicates. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Good idea, TODO updated: * Flush cached query plans when the dependent objects change or when the cardinality of parameters changes dramatically --------------------------------------------------------------------------- Jim C. Nasby wrote: > On Tue, Dec 13, 2005 at 04:49:10PM -0500, Neil Conway wrote: > > On Tue, 2005-12-13 at 22:32 +0100, Joachim Wieland wrote: > > > there's a topic that comes up from time to time on the lists, the problem > > > that pgsql functions get planned only once and thereafter the same query > > > plan is used until server shutdown or explicit recreation of the function. > > > > The problem really has nothing to do with functions, per se: whenever a > > plan is created and then stored for future use, the assumptions made by > > that plan may be invalidated by the time the plan is executed. This > > applies to PREPARE, pl/pgsql functions, perhaps the plan caching done by > > the RI triggers, and so forth. > > > > I also think that invalidating cached plans on a periodic basis is the > > wrong approach -- we can use sinval to invalidate plans as soon as a > > dependent database object changes and not before. This thread contains > > some ideas on how to do this: > > > > http://archives.postgresql.org/pgsql-hackers/2005-03/msg00426.php > > > > I got somewhat sidetracked by the complexities of the "central plan > > caching module" that Tom would like to see, but I'm still hoping to take > > a look at this for 8.2. > > As for predicate-driven plan changes (ie: query is planned the first > time with a predicate that has high cardinality, but there are also low > cardinality values that will be queried on), it would make more sense to > track the amount of work (probably tuples fetched) normally required to > execute a prepared statement. Any time that prepared statement is > executed with a set of predicates that substantially changes the amount > of work required it should be remembered and considered for re-planning > the next time the query is executed with those predicates. > -- > Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com > Pervasive Software http://pervasive.com work: 512-231-6117 > vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Is cardinality the only thing we'd need to worry about? My idea was actually to track the amount of work normally required by a stored query plan, and if a query uses that plan but requires a very different amount of work it's a good indication that we either need to replan or store multiple plans for that query. Though if we're certain that cardinality is the only thing that could make a cached plan go bad it would certainly simplify things greatly. On Fri, Dec 16, 2005 at 11:10:43PM -0500, Bruce Momjian wrote: > > Good idea, TODO updated: > > * Flush cached query plans when the dependent objects change or > when the cardinality of parameters changes dramatically > > > --------------------------------------------------------------------------- > > Jim C. Nasby wrote: > > On Tue, Dec 13, 2005 at 04:49:10PM -0500, Neil Conway wrote: > > > On Tue, 2005-12-13 at 22:32 +0100, Joachim Wieland wrote: > > > > there's a topic that comes up from time to time on the lists, the problem > > > > that pgsql functions get planned only once and thereafter the same query > > > > plan is used until server shutdown or explicit recreation of the function. > > > > > > The problem really has nothing to do with functions, per se: whenever a > > > plan is created and then stored for future use, the assumptions made by > > > that plan may be invalidated by the time the plan is executed. This > > > applies to PREPARE, pl/pgsql functions, perhaps the plan caching done by > > > the RI triggers, and so forth. > > > > > > I also think that invalidating cached plans on a periodic basis is the > > > wrong approach -- we can use sinval to invalidate plans as soon as a > > > dependent database object changes and not before. This thread contains > > > some ideas on how to do this: > > > > > > http://archives.postgresql.org/pgsql-hackers/2005-03/msg00426.php > > > > > > I got somewhat sidetracked by the complexities of the "central plan > > > caching module" that Tom would like to see, but I'm still hoping to take > > > a look at this for 8.2. > > > > As for predicate-driven plan changes (ie: query is planned the first > > time with a predicate that has high cardinality, but there are also low > > cardinality values that will be queried on), it would make more sense to > > track the amount of work (probably tuples fetched) normally required to > > execute a prepared statement. Any time that prepared statement is > > executed with a set of predicates that substantially changes the amount > > of work required it should be remembered and considered for re-planning > > the next time the query is executed with those predicates. > > -- > > Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com > > Pervasive Software http://pervasive.com work: 512-231-6117 > > vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 4: Have you searched our list archives? > > > > http://archives.postgresql.org > > > > -- > Bruce Momjian | http://candle.pha.pa.us > pgman@candle.pha.pa.us | (610) 359-1001 > + If your life is a hard drive, | 13 Roberts Road > + Christ can be your backup. | Newtown Square, Pennsylvania 19073 > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings > -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Jim C. Nasby wrote: > Is cardinality the only thing we'd need to worry about? My idea was > actually to track the amount of work normally required by a stored query > plan, and if a query uses that plan but requires a very different amount > of work it's a good indication that we either need to replan or store > multiple plans for that query. Though if we're certain that cardinality > is the only thing that could make a cached plan go bad it would > certainly simplify things greatly. This gets into another area of re-optimizing when the executor finds that the actual tables don't match the optimizer estimates. I think we decided that was too hard/risky, but invalidating the plan might help, though I am thinking re-planning might just generate the same plan as before. I think something would need to have happened since the last planning, like ANALYZE or something. Updated TODO: * Flush cached query plans when the dependent objects change, when the cardinality of parameters changes dramatically, or when new ANALYZE statistics are available --------------------------------------------------------------------------- > > On Fri, Dec 16, 2005 at 11:10:43PM -0500, Bruce Momjian wrote: > > > > Good idea, TODO updated: > > > > * Flush cached query plans when the dependent objects change or > > when the cardinality of parameters changes dramatically > > > > > > --------------------------------------------------------------------------- > > > > Jim C. Nasby wrote: > > > On Tue, Dec 13, 2005 at 04:49:10PM -0500, Neil Conway wrote: > > > > On Tue, 2005-12-13 at 22:32 +0100, Joachim Wieland wrote: > > > > > there's a topic that comes up from time to time on the lists, the problem > > > > > that pgsql functions get planned only once and thereafter the same query > > > > > plan is used until server shutdown or explicit recreation of the function. > > > > > > > > The problem really has nothing to do with functions, per se: whenever a > > > > plan is created and then stored for future use, the assumptions made by > > > > that plan may be invalidated by the time the plan is executed. This > > > > applies to PREPARE, pl/pgsql functions, perhaps the plan caching done by > > > > the RI triggers, and so forth. > > > > > > > > I also think that invalidating cached plans on a periodic basis is the > > > > wrong approach -- we can use sinval to invalidate plans as soon as a > > > > dependent database object changes and not before. This thread contains > > > > some ideas on how to do this: > > > > > > > > http://archives.postgresql.org/pgsql-hackers/2005-03/msg00426.php > > > > > > > > I got somewhat sidetracked by the complexities of the "central plan > > > > caching module" that Tom would like to see, but I'm still hoping to take > > > > a look at this for 8.2. > > > > > > As for predicate-driven plan changes (ie: query is planned the first > > > time with a predicate that has high cardinality, but there are also low > > > cardinality values that will be queried on), it would make more sense to > > > track the amount of work (probably tuples fetched) normally required to > > > execute a prepared statement. Any time that prepared statement is > > > executed with a set of predicates that substantially changes the amount > > > of work required it should be remembered and considered for re-planning > > > the next time the query is executed with those predicates. > > > -- > > > Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com > > > Pervasive Software http://pervasive.com work: 512-231-6117 > > > vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 > > > > > > ---------------------------(end of broadcast)--------------------------- > > > TIP 4: Have you searched our list archives? > > > > > > http://archives.postgresql.org > > > > > > > -- > > Bruce Momjian | http://candle.pha.pa.us > > pgman@candle.pha.pa.us | (610) 359-1001 > > + If your life is a hard drive, | 13 Roberts Road > > + Christ can be your backup. | Newtown Square, Pennsylvania 19073 > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 5: don't forget to increase your free space map settings > > > > -- > Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com > Pervasive Software http://pervasive.com work: 512-231-6117 > vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian wrote: > * Flush cached query plans when the dependent objects change, > when the cardinality of parameters changes dramatically, or > when new ANALYZE statistics are available Wouldn't it also make sense to flush a cached query plan when after execution it is determined that one or more assumptions that the cached query plan was based on was found to be off? Like the query plan was based on the assumption that a particular table would only return a hand full of rows, but in reality it returned a few thousand. regards, Lukas
Lukas Smith <mls@pooteeweet.org> writes: > Bruce Momjian wrote: > >> * Flush cached query plans when the dependent objects change, >> when the cardinality of parameters changes dramatically, or >> when new ANALYZE statistics are available > > Wouldn't it also make sense to flush a cached query plan when after > execution it is determined that one or more assumptions that the > cached query plan was based on was found to be off? Like the query > plan was based on the assumption that a particular table would only > return a hand full of rows, but in reality it returned a few > thousand. There is some merit to that. I could also see it being sensible to flush a cached plan any time the query took more than some [arbitrary/GUC-chosen] interval. Supposing it took 20s to execute the query, it would surely seem surprising for re-evaluating the plan to be expected to make up a material proportion of the cost of the *next* invocation. If we flush every plan that took >10s to evaluate, that offers the possibility for it to be done better next time... -- output = reverse("gro.mca" "@" "enworbbc") http://www.ntlug.org/~cbbrowne/internet.html Points are awarded for getting the last word in. Drawing the conversation out so long that the original message disappears due to being indented off the right hand edge of the screen is one way to do this. Another is to imply that anyone replying further is a hopeless cretin and is wasting everyone's valuable time. -- from the Symbolics Guidelines for Sending Mail
Chris Browne wrote: > Lukas Smith <mls@pooteeweet.org> writes: > > Bruce Momjian wrote: > > > >> * Flush cached query plans when the dependent objects change, > >> when the cardinality of parameters changes dramatically, or > >> when new ANALYZE statistics are available > > > > Wouldn't it also make sense to flush a cached query plan when after > > execution it is determined that one or more assumptions that the > > cached query plan was based on was found to be off? Like the query > > plan was based on the assumption that a particular table would only > > return a hand full of rows, but in reality it returned a few > > thousand. > > There is some merit to that. > > I could also see it being sensible to flush a cached plan any time the > query took more than some [arbitrary/GUC-chosen] interval. > > Supposing it took 20s to execute the query, it would surely seem > surprising for re-evaluating the plan to be expected to make up a > material proportion of the cost of the *next* invocation. > > If we flush every plan that took >10s to evaluate, that offers the > possibility for it to be done better next time... Ah, but how do you pass that information back to the optimizer so you don't end up getting the same plan again? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Lukas Smith <mls@pooteeweet.org> writes: > Bruce Momjian wrote: >> * Flush cached query plans when the dependent objects change, >> when the cardinality of parameters changes dramatically, or >> when new ANALYZE statistics are available > Wouldn't it also make sense to flush a cached query plan when after > execution it is determined that one or more assumptions that the cached > query plan was based on was found to be off? Not unless you do something that would cause the planner to make different choices next time. (Such as changing the ANALYZE statistics, perhaps.) The TODO item is OK as stated, it's just talking about mechanism and not the things that might trigger the mechanism. regards, tom lane
> Chris Browne wrote: >> Lukas Smith <mls@pooteeweet.org> writes: >> > Bruce Momjian wrote: >> > >> >> * Flush cached query plans when the dependent objects change, >> >> when the cardinality of parameters changes dramatically, or >> >> when new ANALYZE statistics are available >> > >> > Wouldn't it also make sense to flush a cached query plan when after >> > execution it is determined that one or more assumptions that the >> > cached query plan was based on was found to be off? Like the query >> > plan was based on the assumption that a particular table would only >> > return a hand full of rows, but in reality it returned a few >> > thousand. >> >> There is some merit to that. >> >> I could also see it being sensible to flush a cached plan any time the >> query took more than some [arbitrary/GUC-chosen] interval. >> >> Supposing it took 20s to execute the query, it would surely seem >> surprising for re-evaluating the plan to be expected to make up a >> material proportion of the cost of the *next* invocation. >> >> If we flush every plan that took >10s to evaluate, that offers the >> possibility for it to be done better next time... > > Ah, but how do you pass that information back to the optimizer so you > don't end up getting the same plan again? We can't, in any direct fashion, of course. Even if flushing the plan doesn't lead to a better one, now, it still leaves you ready for when an ANALYZE will come along and change the stats and possibly improve things. One possibility to do something indirect would be for this "plan invalidation" to also feed some stats to pg_autovacuum, such that every "bad query plan" (as evaluated by the notion that the actual number of tuples for a given table wildly varied from expectations) would bump up the stats for the offending table. That would give some feedback to encourage pg_autovacuum to analyze trouble tables again sooner. -- (reverse (concatenate 'string "moc.liamg" "@" "enworbbc")) http://cbbrowne.com/info/slony.html ``Lisp has jokingly been called "the most intelligent way to misuse a computer". I think that description is a great compliment because it transmits the full flavor of liberation: it has assisted a number of our most gifted fellow humans in thinking previously impossible thoughts.'' -- "The Humble Programmer", E. Dijkstra, CACM, vol. 15, n. 10, 1972
> Lukas Smith <mls@pooteeweet.org> writes: >> Bruce Momjian wrote: >>> * Flush cached query plans when the dependent objects change, >>> when the cardinality of parameters changes dramatically, or >>> when new ANALYZE statistics are available > >> Wouldn't it also make sense to flush a cached query plan when after >> execution it is determined that one or more assumptions that the cached >> query plan was based on was found to be off? > > Not unless you do something that would cause the planner to make > different choices next time. (Such as changing the ANALYZE statistics, > perhaps.) The TODO item is OK as stated, it's just talking about > mechanism and not the things that might trigger the mechanism. A mechanism might be to bump up the stats stored for pg_autovacuum, which would encourage a table to be re-ANALYZEd. That may not immediately change ANALYZE statistics, but it's something... Even if there is NO feedback mechanism on statistics, if we know the plan was pretty bad, it is surely at least *a* feedback to invalidate the plan. -- (reverse (concatenate 'string "moc.liamg" "@" "enworbbc")) http://cbbrowne.com/info/slony.html ``Lisp has jokingly been called "the most intelligent way to misuse a computer". I think that description is a great compliment because it transmits the full flavor of liberation: it has assisted a number of our most gifted fellow humans in thinking previously impossible thoughts.'' -- "The Humble Programmer", E. Dijkstra, CACM, vol. 15, n. 10, 1972
Christopher Browne wrote: > > Not unless you do something that would cause the planner to make > > different choices next time. (Such as changing the ANALYZE statistics, > > perhaps.) The TODO item is OK as stated, it's just talking about > > mechanism and not the things that might trigger the mechanism. > > A mechanism might be to bump up the stats stored for pg_autovacuum, > which would encourage a table to be re-ANALYZEd. > > That may not immediately change ANALYZE statistics, but it's > something... > > Even if there is NO feedback mechanism on statistics, if we know the > plan was pretty bad, it is surely at least *a* feedback to invalidate > the plan. Something has to cause the replan to be better, and that is one possibility. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On Saturday 2005-12-17 16:28, Lukas Smith wrote: > Bruce Momjian wrote: > > * Flush cached query plans when the dependent objects change, > > when the cardinality of parameters changes dramatically, or > > when new ANALYZE statistics are available > > Wouldn't it also make sense to flush a cached query plan when after > execution it is determined that one or more assumptions that the cached > query plan was based on was found to be off? Like the query plan was > based on the assumption that a particular table would only return a hand > full of rows, but in reality it returned a few thousand. > > regards, > Lukas > Proposed rewrite * Mark query plan for flush (opportunistic replan) when:** dependent objects change,** cardinality of parameters changessufficiently (per planner parameter)** when new ANALYZE statistics are available and per planner parameter differ sufficiently from prior statistics. * Mark plan as "tried" when parameters of returned set out of statistical control, create alternate plan hill-climbing to statical control.** Too many/too few rows relative to plan expectations *** Auto-sample for better statistics?** History of plan shows throughput time for result set varies excessively (need more execution stability, possibly at expense of median optimality). > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly
On Sat, Dec 17, 2005 at 01:07:10PM -0500, Bruce Momjian wrote: > Jim C. Nasby wrote: > > Is cardinality the only thing we'd need to worry about? My idea was > > actually to track the amount of work normally required by a stored query > > plan, and if a query uses that plan but requires a very different amount > > of work it's a good indication that we either need to replan or store > > multiple plans for that query. Though if we're certain that cardinality > > is the only thing that could make a cached plan go bad it would > > certainly simplify things greatly. > > This gets into another area of re-optimizing when the executor finds > that the actual tables don't match the optimizer estimates. I think we > decided that was too hard/risky, but invalidating the plan might help, > though I am thinking re-planning might just generate the same plan as > before. I think something would need to have happened since the last > planning, like ANALYZE or something. Well, in the stored plan case, presumably what's changed is one of the bound parameters. And if we want to be sophisticated about it, we won't just throw out the old plan; rather we'll try and figure out what parameter it is that's wanting a different plan. > Updated TODO: > > * Flush cached query plans when the dependent objects change, > when the cardinality of parameters changes dramatically, or > when new ANALYZE statistics are available Probably worth pointing to this therad in the TODO... -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
It seems to me like there are two classes of problems here: 1) Simply invalidating plans made with out of date statistics. 2) Using run-time collected data to update the plan to something more intelligent. It also seems like #1 would be fairly straightforward and simple whereas #2 would be much more complex. #1 would do me a world of good and probably other people as well. Postgres's query planning has always been fine for me, or at least I have always been able to optimize my queries when I've got a representative data set to work with. Query plan caching only gets me when the query plan is created before the statistics are present to create a good plan. Just one users 2 cents. - Rick Gigger On Dec 19, 2005, at 12:00 PM, Jim C. Nasby wrote: > On Sat, Dec 17, 2005 at 01:07:10PM -0500, Bruce Momjian wrote: >> Jim C. Nasby wrote: >>> Is cardinality the only thing we'd need to worry about? My idea was >>> actually to track the amount of work normally required by a >>> stored query >>> plan, and if a query uses that plan but requires a very different >>> amount >>> of work it's a good indication that we either need to replan or >>> store >>> multiple plans for that query. Though if we're certain that >>> cardinality >>> is the only thing that could make a cached plan go bad it would >>> certainly simplify things greatly. >> >> This gets into another area of re-optimizing when the executor finds >> that the actual tables don't match the optimizer estimates. I >> think we >> decided that was too hard/risky, but invalidating the plan might >> help, >> though I am thinking re-planning might just generate the same plan as >> before. I think something would need to have happened since the last >> planning, like ANALYZE or something. > > Well, in the stored plan case, presumably what's changed is one of the > bound parameters. And if we want to be sophisticated about it, we > won't > just throw out the old plan; rather we'll try and figure out what > parameter it is that's wanting a different plan. > >> Updated TODO: >> >> * Flush cached query plans when the dependent objects change, >> when the cardinality of parameters changes dramatically, or >> when new ANALYZE statistics are available > > Probably worth pointing to this therad in the TODO... > -- > Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com > Pervasive Software http://pervasive.com work: 512-231-6117 > vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 > > ---------------------------(end of > broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match >
Trent Shipley wrote: > On Saturday 2005-12-17 16:28, Lukas Smith wrote: > > Bruce Momjian wrote: > > > * Flush cached query plans when the dependent objects change, > > > when the cardinality of parameters changes dramatically, or > > > when new ANALYZE statistics are available > > > > Wouldn't it also make sense to flush a cached query plan when after > > execution it is determined that one or more assumptions that the cached > > query plan was based on was found to be off? Like the query plan was > > based on the assumption that a particular table would only return a hand > > full of rows, but in reality it returned a few thousand. > > > > regards, > > Lukas > > > > Proposed rewrite > > * Mark query plan for flush (opportunistic replan) when: > ** dependent objects change, > ** cardinality of parameters changes sufficiently (per planner parameter) > ** when new ANALYZE statistics are available and per planner parameter differ > sufficiently from prior statistics. Frankly, I think any new ANALYZE statistics should just invalidate the plan. I don't think it is worth trying to determine if they changed sufficiently or not --- you might as we just replan. > * Mark plan as "tried" when parameters of returned set out of statistical > control, create alternate plan hill-climbing to statical control. > ** Too many/too few rows relative to plan expectations > *** Auto-sample for better statistics? > ** History of plan shows throughput time for result set varies excessively > (need more execution stability, possibly at expense of median optimality). This is a new idea, that you are remembering bad plans. I am unsure how we would track that information. It gets into the area of having the optimizer change its behavior based on previous runs, and I am not sure we have ever agreed to get into that kind of behavior. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Rick Gigger wrote: > It seems to me like there are two classes of problems here: > > 1) Simply invalidating plans made with out of date statistics. > 2) Using run-time collected data to update the plan to something more > intelligent. > > It also seems like #1 would be fairly straightforward and simple > whereas #2 would be much more complex. #1 would do me a world of > good and probably other people as well. Postgres's query planning > has always been fine for me, or at least I have always been able to > optimize my queries when I've got a representative data set to work > with. Query plan caching only gets me when the query plan is created > before the statistics are present to create a good plan. > > Just one users 2 cents. Agreed. I just can't add #2 unless we get more agreement from the group, because it has been a disputed issue in the past. --------------------------------------------------------------------------- > > - Rick Gigger > > > On Dec 19, 2005, at 12:00 PM, Jim C. Nasby wrote: > > > On Sat, Dec 17, 2005 at 01:07:10PM -0500, Bruce Momjian wrote: > >> Jim C. Nasby wrote: > >>> Is cardinality the only thing we'd need to worry about? My idea was > >>> actually to track the amount of work normally required by a > >>> stored query > >>> plan, and if a query uses that plan but requires a very different > >>> amount > >>> of work it's a good indication that we either need to replan or > >>> store > >>> multiple plans for that query. Though if we're certain that > >>> cardinality > >>> is the only thing that could make a cached plan go bad it would > >>> certainly simplify things greatly. > >> > >> This gets into another area of re-optimizing when the executor finds > >> that the actual tables don't match the optimizer estimates. I > >> think we > >> decided that was too hard/risky, but invalidating the plan might > >> help, > >> though I am thinking re-planning might just generate the same plan as > >> before. I think something would need to have happened since the last > >> planning, like ANALYZE or something. > > > > Well, in the stored plan case, presumably what's changed is one of the > > bound parameters. And if we want to be sophisticated about it, we > > won't > > just throw out the old plan; rather we'll try and figure out what > > parameter it is that's wanting a different plan. > > > >> Updated TODO: > >> > >> * Flush cached query plans when the dependent objects change, > >> when the cardinality of parameters changes dramatically, or > >> when new ANALYZE statistics are available > > > > Probably worth pointing to this therad in the TODO... > > -- > > Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com > > Pervasive Software http://pervasive.com work: 512-231-6117 > > vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 > > > > ---------------------------(end of > > broadcast)--------------------------- > > TIP 9: In versions below 8.0, the planner will ignore your desire to > > choose an index scan if your joining column's datatypes do not > > match > > > > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On Wed, Dec 21, 2005 at 05:43:38PM -0500, Bruce Momjian wrote: > Rick Gigger wrote: > > It seems to me like there are two classes of problems here: > > > > 1) Simply invalidating plans made with out of date statistics. > > 2) Using run-time collected data to update the plan to something more > > intelligent. > > > > It also seems like #1 would be fairly straightforward and simple > > whereas #2 would be much more complex. #1 would do me a world of > > good and probably other people as well. Postgres's query planning > > has always been fine for me, or at least I have always been able to > > optimize my queries when I've got a representative data set to work > > with. Query plan caching only gets me when the query plan is created > > before the statistics are present to create a good plan. > > > > Just one users 2 cents. > > Agreed. I just can't add #2 unless we get more agreement from the > group, because it has been a disputed issue in the past. Well, how about this, since it's a prerequisit for #2 and would be generally useful anyway: Track normal resource consumption (ie: tuples read) for planned queries and record parameter values that result in drastically different resource consumption. This would at least make it easy for admins to identify prepared queries that have a highly variable execution cost. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Jim C. Nasby wrote: > On Wed, Dec 21, 2005 at 05:43:38PM -0500, Bruce Momjian wrote: > > Rick Gigger wrote: > > > It seems to me like there are two classes of problems here: > > > > > > 1) Simply invalidating plans made with out of date statistics. > > > 2) Using run-time collected data to update the plan to something more > > > intelligent. > > > > > > It also seems like #1 would be fairly straightforward and simple > > > whereas #2 would be much more complex. #1 would do me a world of > > > good and probably other people as well. Postgres's query planning > > > has always been fine for me, or at least I have always been able to > > > optimize my queries when I've got a representative data set to work > > > with. Query plan caching only gets me when the query plan is created > > > before the statistics are present to create a good plan. > > > > > > Just one users 2 cents. > > > > Agreed. I just can't add #2 unless we get more agreement from the > > group, because it has been a disputed issue in the past. > > Well, how about this, since it's a prerequisit for #2 and would be > generally useful anyway: > > Track normal resource consumption (ie: tuples read) for planned queries > and record parameter values that result in drastically different > resource consumption. > > This would at least make it easy for admins to identify prepared queries > that have a highly variable execution cost. We have that TODO already: * Log statements where the optimizer row estimates were dramatically different from the number of rows actually found? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Jim C. Nasby wrote: > On Wed, Dec 21, 2005 at 05:43:38PM -0500, Bruce Momjian wrote: > >>Rick Gigger wrote: >> >>>It seems to me like there are two classes of problems here: >>> >>>1) Simply invalidating plans made with out of date statistics. >>>2) Using run-time collected data to update the plan to something more >>>intelligent. >>> >>>It also seems like #1 would be fairly straightforward and simple >>>whereas #2 would be much more complex. #1 would do me a world of >>>good and probably other people as well. Postgres's query planning >>>has always been fine for me, or at least I have always been able to >>>optimize my queries when I've got a representative data set to work >>>with. Query plan caching only gets me when the query plan is created >>>before the statistics are present to create a good plan. >>> >>>Just one users 2 cents. >> >>Agreed. I just can't add #2 unless we get more agreement from the >>group, because it has been a disputed issue in the past. > > > Well, how about this, since it's a prerequisit for #2 and would be > generally useful anyway: > > Track normal resource consumption (ie: tuples read) for planned queries > and record parameter values that result in drastically different > resource consumption. > > This would at least make it easy for admins to identify prepared queries > that have a highly variable execution cost. Yeah, it seems such a log would be very helpful in its own right for DBA's and also as a feedback loop to find possibles issues in the query planner. And maybe one day this feedback loop can be even directly used by the server itself. regards, Lukas Smith
On Wed, Dec 21, 2005 at 11:00:31PM -0500, Bruce Momjian wrote: > > Track normal resource consumption (ie: tuples read) for planned queries > > and record parameter values that result in drastically different > > resource consumption. > > > > This would at least make it easy for admins to identify prepared queries > > that have a highly variable execution cost. > > We have that TODO already: > > * Log statements where the optimizer row estimates were dramatically > different from the number of rows actually found? Does the stored plan also save how many rows were expected? Otherwise I'm not sure how that TODO covers it... If it does then please ignore my ramblings below. :) My idea has nothing to do with row estimates. It has to do with the amount of work actually done to perform a query. Consider this example: CREATE TABLE queue (status char NOT NULL, queue_item text NOT NULL); CREATE INDEX queue__status ON queue (status); Obviously, to process this you'll need a query like: SELECT * FROM queue WHERE status='N' -- N for New; Say you also occasionally need to see a list of items that have been processed: SELECT * FROM queue WHERE status='D' -- D for Done; And let's say you need to keep done items around for 30 days. Now, if both of these are done using a prepared statement, it's going to look like: SELECT * FROM queue WHERE status='?'; If the first one to run is the queue processing one, the planner will probably choose the index. This means that when we're searching on 'N', there will be a fairly small number of tuples read to execute the query, but when searching for 'D' a very large number of tuples will be read. What I'm proposing is to keep track of the 'normal' number of tuples read when executing a prepared query, and logging any queries that are substantially different. So, if you normally have to read 50 tuples to find all 'N' records, when the query looking for 'D' records comes along and has to read 5000 tuples instead, we want to log that. Probably the easiest way to accomplish this is to store a moving average of tuples read with each prepared statement entry. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Oh, OK, so you are logging prepared queries where the plan generates a significantly different number of rows from previous runs. I am not sure why that is better, or easier, than just invalidating the cached plan if the cardinality changes. --------------------------------------------------------------------------- Jim C. Nasby wrote: > On Wed, Dec 21, 2005 at 11:00:31PM -0500, Bruce Momjian wrote: > > > Track normal resource consumption (ie: tuples read) for planned queries > > > and record parameter values that result in drastically different > > > resource consumption. > > > > > > This would at least make it easy for admins to identify prepared queries > > > that have a highly variable execution cost. > > > > We have that TODO already: > > > > * Log statements where the optimizer row estimates were dramatically > > different from the number of rows actually found? > > Does the stored plan also save how many rows were expected? Otherwise > I'm not sure how that TODO covers it... If it does then please ignore my > ramblings below. :) > > My idea has nothing to do with row estimates. It has to do with the > amount of work actually done to perform a query. Consider this example: > > CREATE TABLE queue (status char NOT NULL, queue_item text NOT NULL); > CREATE INDEX queue__status ON queue (status); > > Obviously, to process this you'll need a query like: > SELECT * FROM queue WHERE status='N' -- N for New; > > Say you also occasionally need to see a list of items that have been > processed: > SELECT * FROM queue WHERE status='D' -- D for Done; > > And let's say you need to keep done items around for 30 days. > > Now, if both of these are done using a prepared statement, it's going to > look like: > > SELECT * FROM queue WHERE status='?'; > > If the first one to run is the queue processing one, the planner will > probably choose the index. This means that when we're searching on 'N', > there will be a fairly small number of tuples read to execute the query, > but when searching for 'D' a very large number of tuples will be read. > > What I'm proposing is to keep track of the 'normal' number of tuples > read when executing a prepared query, and logging any queries that are > substantially different. So, if you normally have to read 50 tuples to > find all 'N' records, when the query looking for 'D' records comes along > and has to read 5000 tuples instead, we want to log that. Probably the > easiest way to accomplish this is to store a moving average of tuples > read with each prepared statement entry. > -- > Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com > Pervasive Software http://pervasive.com work: 512-231-6117 > vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Jim C. Nasby wrote: > Now, if both of these are done using a prepared statement, it's going to > look like: > > SELECT * FROM queue WHERE status='?'; > > If the first one to run is the queue processing one, the planner will > probably choose the index. This means that when we're searching on 'N', > there will be a fairly small number of tuples read to execute the query, > but when searching for 'D' a very large number of tuples will be read. I do not know how exactly how pg handles this internally, however while skimming the oracle tuning pocket guide I picked up for 2 euros I noticed that it mentioned that since oracle 9i bound parameter values are evaluated before the execution plan is determined. Maybe I am mixing up separate concepts (are bound variables and prepared statements different concepts?) here. I also do not really understand if that means that oracle does not store a query plan for a prepared query or if it just does some special handling in case it knows that a prepared statement column is known to have a highly varying selectivity per value. regards, Lukas
Lukas Smith wrote: > Jim C. Nasby wrote: > > > Now, if both of these are done using a prepared statement, it's going to > > look like: > > > > SELECT * FROM queue WHERE status='?'; > > > > If the first one to run is the queue processing one, the planner will > > probably choose the index. This means that when we're searching on 'N', > > there will be a fairly small number of tuples read to execute the query, > > but when searching for 'D' a very large number of tuples will be read. > > I do not know how exactly how pg handles this internally, however while > skimming the oracle tuning pocket guide I picked up for 2 euros I > noticed that it mentioned that since oracle 9i bound parameter values > are evaluated before the execution plan is determined. > > Maybe I am mixing up separate concepts (are bound variables and prepared > statements different concepts?) here. I also do not really understand if > that means that oracle does not store a query plan for a prepared query > or if it just does some special handling in case it knows that a > prepared statement column is known to have a highly varying selectivity > per value. What the Oralce manual means I think is that the plan of the query is delayed until the _first_ EXECUTE, so it has some values to use in the optimizer. The problem is that later queries might use constants of greatly different cardinality. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian wrote: >> Maybe I am mixing up separate concepts (are bound variables and prepared >> statements different concepts?) here. I also do not really understand if >> that means that oracle does not store a query plan for a prepared query >> or if it just does some special handling in case it knows that a >> prepared statement column is known to have a highly varying selectivity >> per value. > > What the Oralce manual means I think is that the plan of the query is > delayed until the _first_ EXECUTE, so it has some values to use in the > optimizer. The problem is that later queries might use constants of > greatly different cardinality. ok .. which just goes to tell to not use prepared statements for a column with highly varying selectivity ..? or is there a realistic shot at fixing this use case? regards, Lukas
On Thu, Dec 22, 2005 at 09:55:14PM +0100, Lukas Smith wrote: > Bruce Momjian wrote: > > >>Maybe I am mixing up separate concepts (are bound variables and prepared > >>statements different concepts?) here. I also do not really understand if > >>that means that oracle does not store a query plan for a prepared query > >>or if it just does some special handling in case it knows that a > >>prepared statement column is known to have a highly varying selectivity > >>per value. > > > >What the Oralce manual means I think is that the plan of the query is > >delayed until the _first_ EXECUTE, so it has some values to use in the > >optimizer. The problem is that later queries might use constants of > >greatly different cardinality. > > ok .. which just goes to tell to not use prepared statements for a > column with highly varying selectivity ..? > > or is there a realistic shot at fixing this use case? FWIW, I believe that 10g has some brains in this regard, where it can detect if it should store multiple plans for one prepared statement. This is critical for them, because they'r parser/planner is much harder on the system than ours is. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Lukas Smith wrote: > Bruce Momjian wrote: > > >> Maybe I am mixing up separate concepts (are bound variables and prepared > >> statements different concepts?) here. I also do not really understand if > >> that means that oracle does not store a query plan for a prepared query > >> or if it just does some special handling in case it knows that a > >> prepared statement column is known to have a highly varying selectivity > >> per value. > > > > What the Oralce manual means I think is that the plan of the query is > > delayed until the _first_ EXECUTE, so it has some values to use in the > > optimizer. The problem is that later queries might use constants of > > greatly different cardinality. > > ok .. which just goes to tell to not use prepared statements for a > column with highly varying selectivity ..? > > or is there a realistic shot at fixing this use case? It is an issue for all databases. We gave a TODO about it: * Flush cached query plans when the dependent objects change, when the cardinality of parameters changes dramatically, or when new ANALYZE statistics are available -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian wrote: > It is an issue for all databases. We gave a TODO about it: > > * Flush cached query plans when the dependent objects change, > when the cardinality of parameters changes dramatically, or > when new ANALYZE statistics are available Ok, just so I understand this correctly. In the mentioned case the cardinality does not really change in regards to the table stats, its just thatI happen to use a value that has a different selectivity and therefore I may need a different plan. So I do not really see how this use case is handled with the above todo, nor do I really see how its handled with what Jim suggested earlier. The fact of the matter is that for this use case you need to use different query plans for the same prepared statements. regards, Lukas
Well, not just rows; total tuples, both base heap and index. ISTM that would be a better metric than just plain rows read out of base or rows returned. Depending on how far down this road we want to go, this would allow for detecting what parameter values require different query plans, and then using different query plans for different sets of values. Simply invalidating the cached plan means you could potentially end up needing to re-plan very frequently. But given the current speed of our optimizer, it's probably not worth going to this extent. Another concern I have is: is cardinality the only metric we need to look at when deciding to re-plan or are there others? In either case, my guess is that tracking the info needed to make this idea happen is probably much easier than doing automatic plan invalidation based on cardinality, so it would be a useful interum step. But if we could actually get cardinality invalidation into 8.2, I'd say put the effort into that... On Thu, Dec 22, 2005 at 03:14:09PM -0500, Bruce Momjian wrote: > > Oh, OK, so you are logging prepared queries where the plan generates a > significantly different number of rows from previous runs. I am not > sure why that is better, or easier, than just invalidating the cached > plan if the cardinality changes. > > --------------------------------------------------------------------------- > > Jim C. Nasby wrote: > > On Wed, Dec 21, 2005 at 11:00:31PM -0500, Bruce Momjian wrote: > > > > Track normal resource consumption (ie: tuples read) for planned queries > > > > and record parameter values that result in drastically different > > > > resource consumption. > > > > > > > > This would at least make it easy for admins to identify prepared queries > > > > that have a highly variable execution cost. > > > > > > We have that TODO already: > > > > > > * Log statements where the optimizer row estimates were dramatically > > > different from the number of rows actually found? > > > > Does the stored plan also save how many rows were expected? Otherwise > > I'm not sure how that TODO covers it... If it does then please ignore my > > ramblings below. :) > > > > My idea has nothing to do with row estimates. It has to do with the > > amount of work actually done to perform a query. Consider this example: > > > > CREATE TABLE queue (status char NOT NULL, queue_item text NOT NULL); > > CREATE INDEX queue__status ON queue (status); > > > > Obviously, to process this you'll need a query like: > > SELECT * FROM queue WHERE status='N' -- N for New; > > > > Say you also occasionally need to see a list of items that have been > > processed: > > SELECT * FROM queue WHERE status='D' -- D for Done; > > > > And let's say you need to keep done items around for 30 days. > > > > Now, if both of these are done using a prepared statement, it's going to > > look like: > > > > SELECT * FROM queue WHERE status='?'; > > > > If the first one to run is the queue processing one, the planner will > > probably choose the index. This means that when we're searching on 'N', > > there will be a fairly small number of tuples read to execute the query, > > but when searching for 'D' a very large number of tuples will be read. > > > > What I'm proposing is to keep track of the 'normal' number of tuples > > read when executing a prepared query, and logging any queries that are > > substantially different. So, if you normally have to read 50 tuples to > > find all 'N' records, when the query looking for 'D' records comes along > > and has to read 5000 tuples instead, we want to log that. Probably the > > easiest way to accomplish this is to store a moving average of tuples > > read with each prepared statement entry. > > -- > > Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com > > Pervasive Software http://pervasive.com work: 512-231-6117 > > vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 6: explain analyze is your friend > > > > -- > Bruce Momjian | http://candle.pha.pa.us > pgman@candle.pha.pa.us | (610) 359-1001 > + If your life is a hard drive, | 13 Roberts Road > + Christ can be your backup. | Newtown Square, Pennsylvania 19073 > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq > -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Thu, Dec 22, 2005 at 10:14:15PM +0100, Lukas Smith wrote: > Ok, just so I understand this correctly. In the mentioned case the > cardinality does not really change in regards to the table stats, its > just thatI happen to use a value that has a different selectivity and > therefore I may need a different plan. So I do not really see how this > use case is handled with the above todo, nor do I really see how its > handled with what Jim suggested earlier. The fact of the matter is that > for this use case you need to use different query plans for the same > prepared statements. What I mentioned would allow for identifying query plans that this is happening on. Doing something about it would be the next step after that. Can anyone think of something other than selectivity that would make a stored query plan go bad based soley on the parameters being fed into it? (In other words ignore the obvious cases of bad statistics or a DDL change). -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
We need invalidation anyway, so I don't see why an intermediate step makes sense. --------------------------------------------------------------------------- Jim C. Nasby wrote: > Well, not just rows; total tuples, both base heap and index. ISTM that > would be a better metric than just plain rows read out of base or rows > returned. > > Depending on how far down this road we want to go, this would allow for > detecting what parameter values require different query plans, and then > using different query plans for different sets of values. Simply > invalidating the cached plan means you could potentially end up needing > to re-plan very frequently. But given the current speed of our > optimizer, it's probably not worth going to this extent. > > Another concern I have is: is cardinality the only metric we need to > look at when deciding to re-plan or are there others? > > In either case, my guess is that tracking the info needed to make this > idea happen is probably much easier than doing automatic plan > invalidation based on cardinality, so it would be a useful interum step. > But if we could actually get cardinality invalidation into 8.2, I'd say > put the effort into that... > > On Thu, Dec 22, 2005 at 03:14:09PM -0500, Bruce Momjian wrote: > > > > Oh, OK, so you are logging prepared queries where the plan generates a > > significantly different number of rows from previous runs. I am not > > sure why that is better, or easier, than just invalidating the cached > > plan if the cardinality changes. > > > > --------------------------------------------------------------------------- > > > > Jim C. Nasby wrote: > > > On Wed, Dec 21, 2005 at 11:00:31PM -0500, Bruce Momjian wrote: > > > > > Track normal resource consumption (ie: tuples read) for planned queries > > > > > and record parameter values that result in drastically different > > > > > resource consumption. > > > > > > > > > > This would at least make it easy for admins to identify prepared queries > > > > > that have a highly variable execution cost. > > > > > > > > We have that TODO already: > > > > > > > > * Log statements where the optimizer row estimates were dramatically > > > > different from the number of rows actually found? > > > > > > Does the stored plan also save how many rows were expected? Otherwise > > > I'm not sure how that TODO covers it... If it does then please ignore my > > > ramblings below. :) > > > > > > My idea has nothing to do with row estimates. It has to do with the > > > amount of work actually done to perform a query. Consider this example: > > > > > > CREATE TABLE queue (status char NOT NULL, queue_item text NOT NULL); > > > CREATE INDEX queue__status ON queue (status); > > > > > > Obviously, to process this you'll need a query like: > > > SELECT * FROM queue WHERE status='N' -- N for New; > > > > > > Say you also occasionally need to see a list of items that have been > > > processed: > > > SELECT * FROM queue WHERE status='D' -- D for Done; > > > > > > And let's say you need to keep done items around for 30 days. > > > > > > Now, if both of these are done using a prepared statement, it's going to > > > look like: > > > > > > SELECT * FROM queue WHERE status='?'; > > > > > > If the first one to run is the queue processing one, the planner will > > > probably choose the index. This means that when we're searching on 'N', > > > there will be a fairly small number of tuples read to execute the query, > > > but when searching for 'D' a very large number of tuples will be read. > > > > > > What I'm proposing is to keep track of the 'normal' number of tuples > > > read when executing a prepared query, and logging any queries that are > > > substantially different. So, if you normally have to read 50 tuples to > > > find all 'N' records, when the query looking for 'D' records comes along > > > and has to read 5000 tuples instead, we want to log that. Probably the > > > easiest way to accomplish this is to store a moving average of tuples > > > read with each prepared statement entry. > > > -- > > > Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com > > > Pervasive Software http://pervasive.com work: 512-231-6117 > > > vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 > > > > > > ---------------------------(end of broadcast)--------------------------- > > > TIP 6: explain analyze is your friend > > > > > > > -- > > Bruce Momjian | http://candle.pha.pa.us > > pgman@candle.pha.pa.us | (610) 359-1001 > > + If your life is a hard drive, | 13 Roberts Road > > + Christ can be your backup. | Newtown Square, Pennsylvania 19073 > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 3: Have you checked our extensive FAQ? > > > > http://www.postgresql.org/docs/faq > > > > -- > Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com > Pervasive Software http://pervasive.com work: 512-231-6117 > vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian wrote: > Right, if the cardinality changes, you realize this before execution and > optimize/save the plan again. A further optimization would be to save > _multiple_ plans for a single prepared plan based on constants and > choose one of the other, but that is beyond where we are willing to > consider at this stage, I think. ok .. so you store the cardinality that was used when generating the original plan. on the next execution you look up the cardinality again and compare it, if its off too much, you replan. however this could in extreme cases mean that you replan on every execution and thereby killing off the entire advantage of storing the plan. but thats the absolute worse case scenario. regards, Lukas PS: bruce original email was only send to me directly ..
On Thursday 2005-12-22 14:28, Lukas Kahwe Smith wrote: > Bruce Momjian wrote: > > Right, if the cardinality changes, you realize this before execution and > > optimize/save the plan again. A further optimization would be to save > > _multiple_ plans for a single prepared plan based on constants and > > choose one of the other, but that is beyond where we are willing to > > consider at this stage, I think. > > ok .. so you store the cardinality that was used when generating the > original plan. on the next execution you look up the cardinality again > and compare it, if its off too much, you replan. however this could in > extreme cases mean that you replan on every execution and thereby > killing off the entire advantage of storing the plan. but thats the > absolute worse case scenario. > > regards, > Lukas > > PS: bruce original email was only send to me directly .. So you have a parameterized query (one parameter for simplicity of argument), as the parameter changes, cardinality changes dramatically. It seems to me that in this case better than replanning is building a data structure that associates different parameter values with appropriate plans. The plans can be reused until, as would be the case with an no-parameter query, a parameter specific plan should be flushed (or the entire family of plans can be flushed).
Trent Shipley wrote: > On Thursday 2005-12-22 14:28, Lukas Kahwe Smith wrote: > > Bruce Momjian wrote: > > > Right, if the cardinality changes, you realize this before execution and > > > optimize/save the plan again. A further optimization would be to save > > > _multiple_ plans for a single prepared plan based on constants and > > > choose one of the other, but that is beyond where we are willing to > > > consider at this stage, I think. > > > > ok .. so you store the cardinality that was used when generating the > > original plan. on the next execution you look up the cardinality again > > and compare it, if its off too much, you replan. however this could in > > extreme cases mean that you replan on every execution and thereby > > killing off the entire advantage of storing the plan. but thats the > > absolute worse case scenario. > > > > regards, > > Lukas > > > > PS: bruce original email was only send to me directly .. > > So you have a parameterized query (one parameter for simplicity of argument), > as the parameter changes, cardinality changes dramatically. > > It seems to me that in this case better than replanning is building a data > structure that associates different parameter values with appropriate plans. > The plans can be reused until, as would be the case with an no-parameter > query, a parameter specific plan should be flushed (or the entire family of > plans can be flushed). TODO updated:* Flush cached query plans when the dependent objects change, when the cardinality of parameters changes dramatically,or when new ANALYZE statistics are available A more complex solution would be to save multiple plans for different cardinality and use the appropriate plan based on the EXECUTE values. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073