Thread: Automatic function replanning

Automatic function replanning

From

Joachim Wieland

Date:

13 December 2005, 17:32:31

Hi,

there's a topic that comes up from time to time on the lists, the problem
that pgsql functions get planned only once and thereafter the same query
plan is used until server shutdown or explicit recreation of the function.

I'd like to implement a way of automatic function replanning. I can think of
two possible approaches.

1. in a more general way: extend pg_proc by an interval column "ttl" or  "replanAfter" and add a function declaration
attributeto the parser  "... STRICT STABLE REPLAN AFTER '3 days'::interval"
 
 + general approach, every language that can pre-compute plans can use this   feature, the check can be done in one
placefor all languages
 
 - in fact only plpsql can do that at the moment (right?) and there is no   other candidate for something similar at
themoment
 
 - catalog change that also requires interval to be specially treated while   bootstrapping
 - catalog would grow, every function would have the attribute though it is   only applicable for a very low number of
functions,let alone the   number of functions that would actually use it in a typical installation
 

2. use the #option feature of plpgsql. Add the possibility to specify  #option ttl '3 days'  or  #option replan-after
'1day 2 hours'
 
  + Minor changes, changes only local to plpgsql
  - plpgsql specific solution
  - is #option documented at all? Should it stay "unofficial"? If so, why?

3. (not automatic) add a statement that makes pgsql forget the plan and  compute a new one when the function gets
calledagain.  "The user should rather use cron for doing maintenance tasks."
 



What do you think? Any other ideas?


Joachim

Re: Automatic function replanning

From

Neil Conway

Date:

13 December 2005, 17:48:47

On Tue, 2005-12-13 at 22:32 +0100, Joachim Wieland wrote:
> there's a topic that comes up from time to time on the lists, the problem
> that pgsql functions get planned only once and thereafter the same query
> plan is used until server shutdown or explicit recreation of the function.

The problem really has nothing to do with functions, per se: whenever a
plan is created and then stored for future use, the assumptions made by
that plan may be invalidated by the time the plan is executed. This
applies to PREPARE, pl/pgsql functions, perhaps the plan caching done by
the RI triggers, and so forth.

I also think that invalidating cached plans on a periodic basis is the
wrong approach -- we can use sinval to invalidate plans as soon as a
dependent database object changes and not before. This thread contains
some ideas on how to do this:
   http://archives.postgresql.org/pgsql-hackers/2005-03/msg00426.php

I got somewhat sidetracked by the complexities of the "central plan
caching module" that Tom would like to see, but I'm still hoping to take
a look at this for 8.2.

-Neil

Re: Automatic function replanning

From

"Jim C. Nasby"

Date:

15 December 2005, 13:13:20

On Tue, Dec 13, 2005 at 04:49:10PM -0500, Neil Conway wrote:
> On Tue, 2005-12-13 at 22:32 +0100, Joachim Wieland wrote:
> > there's a topic that comes up from time to time on the lists, the problem
> > that pgsql functions get planned only once and thereafter the same query
> > plan is used until server shutdown or explicit recreation of the function.
> 
> The problem really has nothing to do with functions, per se: whenever a
> plan is created and then stored for future use, the assumptions made by
> that plan may be invalidated by the time the plan is executed. This
> applies to PREPARE, pl/pgsql functions, perhaps the plan caching done by
> the RI triggers, and so forth.
> 
> I also think that invalidating cached plans on a periodic basis is the
> wrong approach -- we can use sinval to invalidate plans as soon as a
> dependent database object changes and not before. This thread contains
> some ideas on how to do this:
> 
>     http://archives.postgresql.org/pgsql-hackers/2005-03/msg00426.php
> 
> I got somewhat sidetracked by the complexities of the "central plan
> caching module" that Tom would like to see, but I'm still hoping to take
> a look at this for 8.2.

As for predicate-driven plan changes (ie: query is planned the first
time with a predicate that has high cardinality, but there are also low
cardinality values that will be queried on), it would make more sense to
track the amount of work (probably tuples fetched) normally required to
execute a prepared statement. Any time that prepared statement is
executed with a set of predicates that substantially changes the amount
of work required it should be remembered and considered for re-planning
the next time the query is executed with those predicates.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Automatic function replanning

From

Bruce Momjian

Date:

17 December 2005, 00:11:15

Good idea, TODO updated:
* Flush cached query plans when the dependent objects change or  when the cardinality of parameters changes
dramatically


---------------------------------------------------------------------------

Jim C. Nasby wrote:
> On Tue, Dec 13, 2005 at 04:49:10PM -0500, Neil Conway wrote:
> > On Tue, 2005-12-13 at 22:32 +0100, Joachim Wieland wrote:
> > > there's a topic that comes up from time to time on the lists, the problem
> > > that pgsql functions get planned only once and thereafter the same query
> > > plan is used until server shutdown or explicit recreation of the function.
> > 
> > The problem really has nothing to do with functions, per se: whenever a
> > plan is created and then stored for future use, the assumptions made by
> > that plan may be invalidated by the time the plan is executed. This
> > applies to PREPARE, pl/pgsql functions, perhaps the plan caching done by
> > the RI triggers, and so forth.
> > 
> > I also think that invalidating cached plans on a periodic basis is the
> > wrong approach -- we can use sinval to invalidate plans as soon as a
> > dependent database object changes and not before. This thread contains
> > some ideas on how to do this:
> > 
> >     http://archives.postgresql.org/pgsql-hackers/2005-03/msg00426.php
> > 
> > I got somewhat sidetracked by the complexities of the "central plan
> > caching module" that Tom would like to see, but I'm still hoping to take
> > a look at this for 8.2.
> 
> As for predicate-driven plan changes (ie: query is planned the first
> time with a predicate that has high cardinality, but there are also low
> cardinality values that will be queried on), it would make more sense to
> track the amount of work (probably tuples fetched) normally required to
> execute a prepared statement. Any time that prepared statement is
> executed with a set of predicates that substantially changes the amount
> of work required it should be remembered and considered for re-planning
> the next time the query is executed with those predicates.
> -- 
> Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
> Pervasive Software      http://pervasive.com    work: 512-231-6117
> vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
> 
>                http://archives.postgresql.org
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: Automatic function replanning

From

"Jim C. Nasby"

Date:

17 December 2005, 13:52:22

Is cardinality the only thing we'd need to worry about? My idea was
actually to track the amount of work normally required by a stored query
plan, and if a query uses that plan but requires a very different amount
of work it's a good indication that we either need to replan or store
multiple plans for that query. Though if we're certain that cardinality
is the only thing that could make a cached plan go bad it would
certainly simplify things greatly.

On Fri, Dec 16, 2005 at 11:10:43PM -0500, Bruce Momjian wrote:
> 
> Good idea, TODO updated:
> 
>     * Flush cached query plans when the dependent objects change or
>       when the cardinality of parameters changes dramatically
> 
> 
> ---------------------------------------------------------------------------
> 
> Jim C. Nasby wrote:
> > On Tue, Dec 13, 2005 at 04:49:10PM -0500, Neil Conway wrote:
> > > On Tue, 2005-12-13 at 22:32 +0100, Joachim Wieland wrote:
> > > > there's a topic that comes up from time to time on the lists, the problem
> > > > that pgsql functions get planned only once and thereafter the same query
> > > > plan is used until server shutdown or explicit recreation of the function.
> > > 
> > > The problem really has nothing to do with functions, per se: whenever a
> > > plan is created and then stored for future use, the assumptions made by
> > > that plan may be invalidated by the time the plan is executed. This
> > > applies to PREPARE, pl/pgsql functions, perhaps the plan caching done by
> > > the RI triggers, and so forth.
> > > 
> > > I also think that invalidating cached plans on a periodic basis is the
> > > wrong approach -- we can use sinval to invalidate plans as soon as a
> > > dependent database object changes and not before. This thread contains
> > > some ideas on how to do this:
> > > 
> > >     http://archives.postgresql.org/pgsql-hackers/2005-03/msg00426.php
> > > 
> > > I got somewhat sidetracked by the complexities of the "central plan
> > > caching module" that Tom would like to see, but I'm still hoping to take
> > > a look at this for 8.2.
> > 
> > As for predicate-driven plan changes (ie: query is planned the first
> > time with a predicate that has high cardinality, but there are also low
> > cardinality values that will be queried on), it would make more sense to
> > track the amount of work (probably tuples fetched) normally required to
> > execute a prepared statement. Any time that prepared statement is
> > executed with a set of predicates that substantially changes the amount
> > of work required it should be remembered and considered for re-planning
> > the next time the query is executed with those predicates.
> > -- 
> > Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
> > Pervasive Software      http://pervasive.com    work: 512-231-6117
> > vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461
> > 
> > ---------------------------(end of broadcast)---------------------------
> > TIP 4: Have you searched our list archives?
> > 
> >                http://archives.postgresql.org
> > 
> 
> -- 
>   Bruce Momjian                        |  http://candle.pha.pa.us
>   pgman@candle.pha.pa.us               |  (610) 359-1001
>   +  If your life is a hard drive,     |  13 Roberts Road
>   +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
> 

-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Automatic function replanning

From

Bruce Momjian

Date:

17 December 2005, 14:12:03

Jim C. Nasby wrote:
> Is cardinality the only thing we'd need to worry about? My idea was
> actually to track the amount of work normally required by a stored query
> plan, and if a query uses that plan but requires a very different amount
> of work it's a good indication that we either need to replan or store
> multiple plans for that query. Though if we're certain that cardinality
> is the only thing that could make a cached plan go bad it would
> certainly simplify things greatly.

This gets into another area of re-optimizing when the executor finds
that the actual tables don't match the optimizer estimates.  I think we
decided that was too hard/risky, but invalidating the plan might help,
though I am thinking re-planning might just generate the same plan as
before.  I think something would need to have happened since the last
planning, like ANALYZE or something.

Updated TODO:
* Flush cached query plans when the dependent objects change,  when the cardinality of parameters changes dramatically,
or when new ANALYZE statistics are available
 

---------------------------------------------------------------------------


> 
> On Fri, Dec 16, 2005 at 11:10:43PM -0500, Bruce Momjian wrote:
> > 
> > Good idea, TODO updated:
> > 
> >     * Flush cached query plans when the dependent objects change or
> >       when the cardinality of parameters changes dramatically
> > 
> > 
> > ---------------------------------------------------------------------------
> > 
> > Jim C. Nasby wrote:
> > > On Tue, Dec 13, 2005 at 04:49:10PM -0500, Neil Conway wrote:
> > > > On Tue, 2005-12-13 at 22:32 +0100, Joachim Wieland wrote:
> > > > > there's a topic that comes up from time to time on the lists, the problem
> > > > > that pgsql functions get planned only once and thereafter the same query
> > > > > plan is used until server shutdown or explicit recreation of the function.
> > > > 
> > > > The problem really has nothing to do with functions, per se: whenever a
> > > > plan is created and then stored for future use, the assumptions made by
> > > > that plan may be invalidated by the time the plan is executed. This
> > > > applies to PREPARE, pl/pgsql functions, perhaps the plan caching done by
> > > > the RI triggers, and so forth.
> > > > 
> > > > I also think that invalidating cached plans on a periodic basis is the
> > > > wrong approach -- we can use sinval to invalidate plans as soon as a
> > > > dependent database object changes and not before. This thread contains
> > > > some ideas on how to do this:
> > > > 
> > > >     http://archives.postgresql.org/pgsql-hackers/2005-03/msg00426.php
> > > > 
> > > > I got somewhat sidetracked by the complexities of the "central plan
> > > > caching module" that Tom would like to see, but I'm still hoping to take
> > > > a look at this for 8.2.
> > > 
> > > As for predicate-driven plan changes (ie: query is planned the first
> > > time with a predicate that has high cardinality, but there are also low
> > > cardinality values that will be queried on), it would make more sense to
> > > track the amount of work (probably tuples fetched) normally required to
> > > execute a prepared statement. Any time that prepared statement is
> > > executed with a set of predicates that substantially changes the amount
> > > of work required it should be remembered and considered for re-planning
> > > the next time the query is executed with those predicates.
> > > -- 
> > > Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
> > > Pervasive Software      http://pervasive.com    work: 512-231-6117
> > > vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461
> > > 
> > > ---------------------------(end of broadcast)---------------------------
> > > TIP 4: Have you searched our list archives?
> > > 
> > >                http://archives.postgresql.org
> > > 
> > 
> > -- 
> >   Bruce Momjian                        |  http://candle.pha.pa.us
> >   pgman@candle.pha.pa.us               |  (610) 359-1001
> >   +  If your life is a hard drive,     |  13 Roberts Road
> >   +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
> > 
> > ---------------------------(end of broadcast)---------------------------
> > TIP 5: don't forget to increase your free space map settings
> > 
> 
> -- 
> Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
> Pervasive Software      http://pervasive.com    work: 512-231-6117
> vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: Automatic function replanning

From

Lukas Smith

Date:

17 December 2005, 19:28:46

Bruce Momjian wrote:

>     * Flush cached query plans when the dependent objects change,
>       when the cardinality of parameters changes dramatically, or
>       when new ANALYZE statistics are available

Wouldn't it also make sense to flush a cached query plan when after 
execution it is determined that one or more assumptions that the cached 
query plan was based on was found to be off? Like the query plan was 
based on the assumption that a particular table would only return a hand 
full of rows, but in reality it returned a few thousand.

regards,
Lukas

Re: Automatic function replanning

From

Chris Browne

Date:

17 December 2005, 22:04:03

Lukas Smith <mls@pooteeweet.org> writes:
> Bruce Momjian wrote:
>
>>     * Flush cached query plans when the dependent objects change,
>>       when the cardinality of parameters changes dramatically, or
>>       when new ANALYZE statistics are available
>
> Wouldn't it also make sense to flush a cached query plan when after
> execution it is determined that one or more assumptions that the
> cached query plan was based on was found to be off? Like the query
> plan was based on the assumption that a particular table would only
> return a hand full of rows, but in reality it returned a few
> thousand.

There is some merit to that.

I could also see it being sensible to flush a cached plan any time the
query took more than some [arbitrary/GUC-chosen] interval.

Supposing it took 20s to execute the query, it would surely seem
surprising for re-evaluating the plan to be expected to make up a
material proportion of the cost of the *next* invocation.

If we flush every plan that took >10s to evaluate, that offers the
possibility for it to be done better next time...
-- 
output = reverse("gro.mca" "@" "enworbbc")
http://www.ntlug.org/~cbbrowne/internet.html
Points  are awarded for   getting   the last   word in.   Drawing  the
conversation out so long  that the original  message disappears due to
being indented off the right hand edge of the screen is  one way to do
this.  Another is to imply that  anyone replying further is a hopeless
cretin and is wasting everyone's valuable time.
-- from the Symbolics Guidelines for Sending Mail

Re: Automatic function replanning

From

Bruce Momjian

Date:

17 December 2005, 22:17:33

Chris Browne wrote:
> Lukas Smith <mls@pooteeweet.org> writes:
> > Bruce Momjian wrote:
> >
> >>     * Flush cached query plans when the dependent objects change,
> >>       when the cardinality of parameters changes dramatically, or
> >>       when new ANALYZE statistics are available
> >
> > Wouldn't it also make sense to flush a cached query plan when after
> > execution it is determined that one or more assumptions that the
> > cached query plan was based on was found to be off? Like the query
> > plan was based on the assumption that a particular table would only
> > return a hand full of rows, but in reality it returned a few
> > thousand.
> 
> There is some merit to that.
> 
> I could also see it being sensible to flush a cached plan any time the
> query took more than some [arbitrary/GUC-chosen] interval.
> 
> Supposing it took 20s to execute the query, it would surely seem
> surprising for re-evaluating the plan to be expected to make up a
> material proportion of the cost of the *next* invocation.
> 
> If we flush every plan that took >10s to evaluate, that offers the
> possibility for it to be done better next time...

Ah, but how do you pass that information back to the optimizer so you
don't end up getting the same plan again?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: Automatic function replanning

From

Tom Lane

Date:

17 December 2005, 23:59:19

Lukas Smith <mls@pooteeweet.org> writes:
> Bruce Momjian wrote:
>> * Flush cached query plans when the dependent objects change,
>> when the cardinality of parameters changes dramatically, or
>> when new ANALYZE statistics are available

> Wouldn't it also make sense to flush a cached query plan when after 
> execution it is determined that one or more assumptions that the cached 
> query plan was based on was found to be off?

Not unless you do something that would cause the planner to make
different choices next time.  (Such as changing the ANALYZE statistics,
perhaps.)  The TODO item is OK as stated, it's just talking about
mechanism and not the things that might trigger the mechanism.
        regards, tom lane

Re: Automatic function replanning

From

Christopher Browne

Date:

18 December 2005, 02:38:27

> Chris Browne wrote:
>> Lukas Smith <mls@pooteeweet.org> writes:
>> > Bruce Momjian wrote:
>> >
>> >>     * Flush cached query plans when the dependent objects change,
>> >>       when the cardinality of parameters changes dramatically, or
>> >>       when new ANALYZE statistics are available
>> >
>> > Wouldn't it also make sense to flush a cached query plan when after
>> > execution it is determined that one or more assumptions that the
>> > cached query plan was based on was found to be off? Like the query
>> > plan was based on the assumption that a particular table would only
>> > return a hand full of rows, but in reality it returned a few
>> > thousand.
>> 
>> There is some merit to that.
>> 
>> I could also see it being sensible to flush a cached plan any time the
>> query took more than some [arbitrary/GUC-chosen] interval.
>> 
>> Supposing it took 20s to execute the query, it would surely seem
>> surprising for re-evaluating the plan to be expected to make up a
>> material proportion of the cost of the *next* invocation.
>> 
>> If we flush every plan that took >10s to evaluate, that offers the
>> possibility for it to be done better next time...
>
> Ah, but how do you pass that information back to the optimizer so you
> don't end up getting the same plan again?

We can't, in any direct fashion, of course.

Even if flushing the plan doesn't lead to a better one, now, it
still leaves you ready for when an ANALYZE will come along and change
the stats and possibly improve things.

One possibility to do something indirect would be for this "plan
invalidation" to also feed some stats to pg_autovacuum, such that
every "bad query plan" (as evaluated by the notion that the actual
number of tuples for a given table wildly varied from expectations)
would bump up the stats for the offending table.

That would give some feedback to encourage pg_autovacuum to analyze
trouble tables again sooner.
-- 
(reverse (concatenate 'string "moc.liamg" "@" "enworbbc"))
http://cbbrowne.com/info/slony.html
``Lisp has jokingly been called "the most intelligent way to misuse a 
computer". I think that description is a great compliment because it 
transmits the full flavor of liberation: it has assisted a number of our 
most gifted fellow humans in thinking previously impossible thoughts.'' 
-- "The Humble Programmer", E. Dijkstra, CACM, vol. 15, n. 10, 1972

Re: Automatic function replanning

From

Christopher Browne

Date:

18 December 2005, 02:38:28

> Lukas Smith <mls@pooteeweet.org> writes:
>> Bruce Momjian wrote:
>>> * Flush cached query plans when the dependent objects change,
>>> when the cardinality of parameters changes dramatically, or
>>> when new ANALYZE statistics are available
>
>> Wouldn't it also make sense to flush a cached query plan when after 
>> execution it is determined that one or more assumptions that the cached 
>> query plan was based on was found to be off?
>
> Not unless you do something that would cause the planner to make
> different choices next time.  (Such as changing the ANALYZE statistics,
> perhaps.)  The TODO item is OK as stated, it's just talking about
> mechanism and not the things that might trigger the mechanism.

A mechanism might be to bump up the stats stored for pg_autovacuum,
which would encourage a table to be re-ANALYZEd.

That may not immediately change ANALYZE statistics, but it's
something...

Even if there is NO feedback mechanism on statistics, if we know the
plan was pretty bad, it is surely at least *a* feedback to invalidate
the plan.
-- 
(reverse (concatenate 'string "moc.liamg" "@" "enworbbc"))
http://cbbrowne.com/info/slony.html
``Lisp has jokingly been called "the most intelligent way to misuse a 
computer". I think that description is a great compliment because it 
transmits the full flavor of liberation: it has assisted a number of our 
most gifted fellow humans in thinking previously impossible thoughts.'' 
-- "The Humble Programmer", E. Dijkstra, CACM, vol. 15, n. 10, 1972

Re: Automatic function replanning

From

Bruce Momjian

Date:

18 December 2005, 09:05:28

Christopher Browne wrote:
> > Not unless you do something that would cause the planner to make
> > different choices next time.  (Such as changing the ANALYZE statistics,
> > perhaps.)  The TODO item is OK as stated, it's just talking about
> > mechanism and not the things that might trigger the mechanism.
> 
> A mechanism might be to bump up the stats stored for pg_autovacuum,
> which would encourage a table to be re-ANALYZEd.
> 
> That may not immediately change ANALYZE statistics, but it's
> something...
> 
> Even if there is NO feedback mechanism on statistics, if we know the
> plan was pretty bad, it is surely at least *a* feedback to invalidate
> the plan.

Something has to cause the replan to be better, and that is one
possibility.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: Automatic function replanning

From

Trent Shipley

Date:

18 December 2005, 23:16:32

On Saturday 2005-12-17 16:28, Lukas Smith wrote:
> Bruce Momjian wrote:
> >     * Flush cached query plans when the dependent objects change,
> >       when the cardinality of parameters changes dramatically, or
> >       when new ANALYZE statistics are available
>
> Wouldn't it also make sense to flush a cached query plan when after
> execution it is determined that one or more assumptions that the cached
> query plan was based on was found to be off? Like the query plan was
> based on the assumption that a particular table would only return a hand
> full of rows, but in reality it returned a few thousand.
>
> regards,
> Lukas
>

Proposed rewrite

* Mark query plan for flush (opportunistic replan) when:** dependent objects change,** cardinality of parameters
changessufficiently (per planner parameter)** when new ANALYZE statistics are available and per planner parameter
differ
 
sufficiently from prior statistics.

* Mark plan as "tried" when parameters of returned set out of statistical 
control, create alternate plan hill-climbing to statical control.** Too many/too few rows relative to plan expectations
  *** Auto-sample for better statistics?** History of plan shows throughput time for result set varies excessively 
 
(need more execution stability, possibly at expense of median optimality).       


> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly

Re: Automatic function replanning

From

"Jim C. Nasby"

Date:

19 December 2005, 15:00:32

On Sat, Dec 17, 2005 at 01:07:10PM -0500, Bruce Momjian wrote:
> Jim C. Nasby wrote:
> > Is cardinality the only thing we'd need to worry about? My idea was
> > actually to track the amount of work normally required by a stored query
> > plan, and if a query uses that plan but requires a very different amount
> > of work it's a good indication that we either need to replan or store
> > multiple plans for that query. Though if we're certain that cardinality
> > is the only thing that could make a cached plan go bad it would
> > certainly simplify things greatly.
> 
> This gets into another area of re-optimizing when the executor finds
> that the actual tables don't match the optimizer estimates.  I think we
> decided that was too hard/risky, but invalidating the plan might help,
> though I am thinking re-planning might just generate the same plan as
> before.  I think something would need to have happened since the last
> planning, like ANALYZE or something.

Well, in the stored plan case, presumably what's changed is one of the
bound parameters. And if we want to be sophisticated about it, we won't
just throw out the old plan; rather we'll try and figure out what
parameter it is that's wanting a different plan.

> Updated TODO:
> 
>     * Flush cached query plans when the dependent objects change,
>       when the cardinality of parameters changes dramatically, or
>       when new ANALYZE statistics are available

Probably worth pointing to this therad in the TODO...
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Automatic function replanning

From

Rick Gigger

Date:

20 December 2005, 14:07:41

It seems to me like there are two classes of problems here:

1) Simply invalidating plans made with out of date statistics.
2) Using run-time collected data to update the plan to something more  
intelligent.

It also seems like #1 would be fairly straightforward and simple  
whereas #2 would be much more complex.  #1 would do me a world of  
good and probably other people as well.  Postgres's query planning  
has always been fine for me, or at least I have always been able to  
optimize my queries when I've got a representative data set to work  
with.  Query plan caching only gets me when the query plan is created  
before the statistics are present to create a good plan.

Just one users 2 cents.

- Rick Gigger


On Dec 19, 2005, at 12:00 PM, Jim C. Nasby wrote:

> On Sat, Dec 17, 2005 at 01:07:10PM -0500, Bruce Momjian wrote:
>> Jim C. Nasby wrote:
>>> Is cardinality the only thing we'd need to worry about? My idea was
>>> actually to track the amount of work normally required by a  
>>> stored query
>>> plan, and if a query uses that plan but requires a very different  
>>> amount
>>> of work it's a good indication that we either need to replan or  
>>> store
>>> multiple plans for that query. Though if we're certain that  
>>> cardinality
>>> is the only thing that could make a cached plan go bad it would
>>> certainly simplify things greatly.
>>
>> This gets into another area of re-optimizing when the executor finds
>> that the actual tables don't match the optimizer estimates.  I  
>> think we
>> decided that was too hard/risky, but invalidating the plan might  
>> help,
>> though I am thinking re-planning might just generate the same plan as
>> before.  I think something would need to have happened since the last
>> planning, like ANALYZE or something.
>
> Well, in the stored plan case, presumably what's changed is one of the
> bound parameters. And if we want to be sophisticated about it, we  
> won't
> just throw out the old plan; rather we'll try and figure out what
> parameter it is that's wanting a different plan.
>
>> Updated TODO:
>>
>>     * Flush cached query plans when the dependent objects change,
>>       when the cardinality of parameters changes dramatically, or
>>       when new ANALYZE statistics are available
>
> Probably worth pointing to this therad in the TODO...
> -- 
> Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
> Pervasive Software      http://pervasive.com    work: 512-231-6117
> vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461
>
> ---------------------------(end of  
> broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
>

Re: Automatic function replanning

From

Bruce Momjian

Date:

21 December 2005, 15:16:45

Trent Shipley wrote:
> On Saturday 2005-12-17 16:28, Lukas Smith wrote:
> > Bruce Momjian wrote:
> > >     * Flush cached query plans when the dependent objects change,
> > >       when the cardinality of parameters changes dramatically, or
> > >       when new ANALYZE statistics are available
> >
> > Wouldn't it also make sense to flush a cached query plan when after
> > execution it is determined that one or more assumptions that the cached
> > query plan was based on was found to be off? Like the query plan was
> > based on the assumption that a particular table would only return a hand
> > full of rows, but in reality it returned a few thousand.
> >
> > regards,
> > Lukas
> >
> 
> Proposed rewrite
> 
> * Mark query plan for flush (opportunistic replan) when:
>     ** dependent objects change,
>     ** cardinality of parameters changes sufficiently (per planner parameter)
>     ** when new ANALYZE statistics are available and per planner parameter differ 
> sufficiently from prior statistics.

Frankly, I think any new ANALYZE statistics should just invalidate the
plan.  I don't think it is worth trying to determine if they changed
sufficiently or not --- you might as we just replan.

> * Mark plan as "tried" when parameters of returned set out of statistical 
> control, create alternate plan hill-climbing to statical control.
>     ** Too many/too few rows relative to plan expectations
>         *** Auto-sample for better statistics?
>     ** History of plan shows throughput time for result set varies excessively 
> (need more execution stability, possibly at expense of median optimality).

This is a new idea, that you are remembering bad plans.  I am unsure how
we would track that information.  It gets into the area of having the
optimizer change its behavior based on previous runs, and I am not sure
we have ever agreed to get into that kind of behavior.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: Automatic function replanning

From

Bruce Momjian

Date:

21 December 2005, 18:44:05

Rick Gigger wrote:
> It seems to me like there are two classes of problems here:
> 
> 1) Simply invalidating plans made with out of date statistics.
> 2) Using run-time collected data to update the plan to something more  
> intelligent.
> 
> It also seems like #1 would be fairly straightforward and simple  
> whereas #2 would be much more complex.  #1 would do me a world of  
> good and probably other people as well.  Postgres's query planning  
> has always been fine for me, or at least I have always been able to  
> optimize my queries when I've got a representative data set to work  
> with.  Query plan caching only gets me when the query plan is created  
> before the statistics are present to create a good plan.
> 
> Just one users 2 cents.

Agreed.  I just can't add #2 unless we get more agreement from the
group, because it has been a disputed issue in the past.

---------------------------------------------------------------------------


> 
> - Rick Gigger
> 
> 
> On Dec 19, 2005, at 12:00 PM, Jim C. Nasby wrote:
> 
> > On Sat, Dec 17, 2005 at 01:07:10PM -0500, Bruce Momjian wrote:
> >> Jim C. Nasby wrote:
> >>> Is cardinality the only thing we'd need to worry about? My idea was
> >>> actually to track the amount of work normally required by a  
> >>> stored query
> >>> plan, and if a query uses that plan but requires a very different  
> >>> amount
> >>> of work it's a good indication that we either need to replan or  
> >>> store
> >>> multiple plans for that query. Though if we're certain that  
> >>> cardinality
> >>> is the only thing that could make a cached plan go bad it would
> >>> certainly simplify things greatly.
> >>
> >> This gets into another area of re-optimizing when the executor finds
> >> that the actual tables don't match the optimizer estimates.  I  
> >> think we
> >> decided that was too hard/risky, but invalidating the plan might  
> >> help,
> >> though I am thinking re-planning might just generate the same plan as
> >> before.  I think something would need to have happened since the last
> >> planning, like ANALYZE or something.
> >
> > Well, in the stored plan case, presumably what's changed is one of the
> > bound parameters. And if we want to be sophisticated about it, we  
> > won't
> > just throw out the old plan; rather we'll try and figure out what
> > parameter it is that's wanting a different plan.
> >
> >> Updated TODO:
> >>
> >>     * Flush cached query plans when the dependent objects change,
> >>       when the cardinality of parameters changes dramatically, or
> >>       when new ANALYZE statistics are available
> >
> > Probably worth pointing to this therad in the TODO...
> > -- 
> > Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
> > Pervasive Software      http://pervasive.com    work: 512-231-6117
> > vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461
> >
> > ---------------------------(end of  
> > broadcast)---------------------------
> > TIP 9: In versions below 8.0, the planner will ignore your desire to
> >        choose an index scan if your joining column's datatypes do not
> >        match
> >
> 
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: Automatic function replanning

From

"Jim C. Nasby"

Date:

21 December 2005, 20:54:10

On Wed, Dec 21, 2005 at 05:43:38PM -0500, Bruce Momjian wrote:
> Rick Gigger wrote:
> > It seems to me like there are two classes of problems here:
> > 
> > 1) Simply invalidating plans made with out of date statistics.
> > 2) Using run-time collected data to update the plan to something more  
> > intelligent.
> > 
> > It also seems like #1 would be fairly straightforward and simple  
> > whereas #2 would be much more complex.  #1 would do me a world of  
> > good and probably other people as well.  Postgres's query planning  
> > has always been fine for me, or at least I have always been able to  
> > optimize my queries when I've got a representative data set to work  
> > with.  Query plan caching only gets me when the query plan is created  
> > before the statistics are present to create a good plan.
> > 
> > Just one users 2 cents.
> 
> Agreed.  I just can't add #2 unless we get more agreement from the
> group, because it has been a disputed issue in the past.

Well, how about this, since it's a prerequisit for #2 and would be
generally useful anyway:

Track normal resource consumption (ie: tuples read) for planned queries
and record parameter values that result in drastically different
resource consumption.

This would at least make it easy for admins to identify prepared queries
that have a highly variable execution cost.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Automatic function replanning

From

Bruce Momjian

Date:

22 December 2005, 00:00:44

Jim C. Nasby wrote:
> On Wed, Dec 21, 2005 at 05:43:38PM -0500, Bruce Momjian wrote:
> > Rick Gigger wrote:
> > > It seems to me like there are two classes of problems here:
> > > 
> > > 1) Simply invalidating plans made with out of date statistics.
> > > 2) Using run-time collected data to update the plan to something more  
> > > intelligent.
> > > 
> > > It also seems like #1 would be fairly straightforward and simple  
> > > whereas #2 would be much more complex.  #1 would do me a world of  
> > > good and probably other people as well.  Postgres's query planning  
> > > has always been fine for me, or at least I have always been able to  
> > > optimize my queries when I've got a representative data set to work  
> > > with.  Query plan caching only gets me when the query plan is created  
> > > before the statistics are present to create a good plan.
> > > 
> > > Just one users 2 cents.
> > 
> > Agreed.  I just can't add #2 unless we get more agreement from the
> > group, because it has been a disputed issue in the past.
> 
> Well, how about this, since it's a prerequisit for #2 and would be
> generally useful anyway:
> 
> Track normal resource consumption (ie: tuples read) for planned queries
> and record parameter values that result in drastically different
> resource consumption.
> 
> This would at least make it easy for admins to identify prepared queries
> that have a highly variable execution cost.

We have that TODO already:
* Log statements where the optimizer row estimates were dramatically  different from the number of rows actually
found?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: Automatic function replanning

From

Lukas Smith

Date:

22 December 2005, 04:36:03

Jim C. Nasby wrote:
> On Wed, Dec 21, 2005 at 05:43:38PM -0500, Bruce Momjian wrote:
> 
>>Rick Gigger wrote:
>>
>>>It seems to me like there are two classes of problems here:
>>>
>>>1) Simply invalidating plans made with out of date statistics.
>>>2) Using run-time collected data to update the plan to something more  
>>>intelligent.
>>>
>>>It also seems like #1 would be fairly straightforward and simple  
>>>whereas #2 would be much more complex.  #1 would do me a world of  
>>>good and probably other people as well.  Postgres's query planning  
>>>has always been fine for me, or at least I have always been able to  
>>>optimize my queries when I've got a representative data set to work  
>>>with.  Query plan caching only gets me when the query plan is created  
>>>before the statistics are present to create a good plan.
>>>
>>>Just one users 2 cents.
>>
>>Agreed.  I just can't add #2 unless we get more agreement from the
>>group, because it has been a disputed issue in the past.
> 
> 
> Well, how about this, since it's a prerequisit for #2 and would be
> generally useful anyway:
> 
> Track normal resource consumption (ie: tuples read) for planned queries
> and record parameter values that result in drastically different
> resource consumption.
> 
> This would at least make it easy for admins to identify prepared queries
> that have a highly variable execution cost.

Yeah, it seems such a log would be very helpful in its own right for 
DBA's and also as a feedback loop to find possibles issues in the query 
planner. And maybe one day this feedback loop can be even directly used 
by the server itself.

regards,
Lukas Smith

Re: Automatic function replanning

From

"Jim C. Nasby"

Date:

22 December 2005, 16:04:01

On Wed, Dec 21, 2005 at 11:00:31PM -0500, Bruce Momjian wrote:
> > Track normal resource consumption (ie: tuples read) for planned queries
> > and record parameter values that result in drastically different
> > resource consumption.
> > 
> > This would at least make it easy for admins to identify prepared queries
> > that have a highly variable execution cost.
> 
> We have that TODO already:
> 
>     * Log statements where the optimizer row estimates were dramatically
>       different from the number of rows actually found?

Does the stored plan also save how many rows were expected? Otherwise
I'm not sure how that TODO covers it... If it does then please ignore my
ramblings below. :)

My idea has nothing to do with row estimates. It has to do with the
amount of work actually done to perform a query. Consider this example:

CREATE TABLE queue (status char NOT NULL, queue_item text NOT NULL);
CREATE INDEX queue__status ON queue (status);

Obviously, to process this you'll need a query like:
SELECT * FROM queue WHERE status='N' -- N for New;

Say you also occasionally need to see a list of items that have been
processed:
SELECT * FROM queue WHERE status='D' -- D for Done;

And let's say you need to keep done items around for 30 days.

Now, if both of these are done using a prepared statement, it's going to
look like:

SELECT * FROM queue WHERE status='?';

If the first one to run is the queue processing one, the planner will
probably choose the index. This means that when we're searching on 'N',
there will be a fairly small number of tuples read to execute the query,
but when searching for 'D' a very large number of tuples will be read.

What I'm proposing is to keep track of the 'normal' number of tuples
read when executing a prepared query, and logging any queries that are
substantially different. So, if you normally have to read 50 tuples to
find all 'N' records, when the query looking for 'D' records comes along
and has to read 5000 tuples instead, we want to log that. Probably the
easiest way to accomplish this is to store a moving average of tuples
read with each prepared statement entry.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Automatic function replanning

From

Bruce Momjian

Date:

22 December 2005, 16:15:51

Oh, OK, so you are logging prepared queries where the plan generates a
significantly different number of rows from previous runs.  I am not
sure why that is better, or easier, than just invalidating the  cached
plan if the cardinality changes.

---------------------------------------------------------------------------

Jim C. Nasby wrote:
> On Wed, Dec 21, 2005 at 11:00:31PM -0500, Bruce Momjian wrote:
> > > Track normal resource consumption (ie: tuples read) for planned queries
> > > and record parameter values that result in drastically different
> > > resource consumption.
> > > 
> > > This would at least make it easy for admins to identify prepared queries
> > > that have a highly variable execution cost.
> > 
> > We have that TODO already:
> > 
> >     * Log statements where the optimizer row estimates were dramatically
> >       different from the number of rows actually found?
> 
> Does the stored plan also save how many rows were expected? Otherwise
> I'm not sure how that TODO covers it... If it does then please ignore my
> ramblings below. :)
> 
> My idea has nothing to do with row estimates. It has to do with the
> amount of work actually done to perform a query. Consider this example:
> 
> CREATE TABLE queue (status char NOT NULL, queue_item text NOT NULL);
> CREATE INDEX queue__status ON queue (status);
> 
> Obviously, to process this you'll need a query like:
> SELECT * FROM queue WHERE status='N' -- N for New;
> 
> Say you also occasionally need to see a list of items that have been
> processed:
> SELECT * FROM queue WHERE status='D' -- D for Done;
> 
> And let's say you need to keep done items around for 30 days.
> 
> Now, if both of these are done using a prepared statement, it's going to
> look like:
> 
> SELECT * FROM queue WHERE status='?';
> 
> If the first one to run is the queue processing one, the planner will
> probably choose the index. This means that when we're searching on 'N',
> there will be a fairly small number of tuples read to execute the query,
> but when searching for 'D' a very large number of tuples will be read.
> 
> What I'm proposing is to keep track of the 'normal' number of tuples
> read when executing a prepared query, and logging any queries that are
> substantially different. So, if you normally have to read 50 tuples to
> find all 'N' records, when the query looking for 'D' records comes along
> and has to read 5000 tuples instead, we want to log that. Probably the
> easiest way to accomplish this is to store a moving average of tuples
> read with each prepared statement entry.
> -- 
> Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
> Pervasive Software      http://pervasive.com    work: 512-231-6117
> vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: Automatic function replanning

From

Lukas Smith

Date:

22 December 2005, 16:48:58

Jim C. Nasby wrote:

> Now, if both of these are done using a prepared statement, it's going to
> look like:
> 
> SELECT * FROM queue WHERE status='?';
> 
> If the first one to run is the queue processing one, the planner will
> probably choose the index. This means that when we're searching on 'N',
> there will be a fairly small number of tuples read to execute the query,
> but when searching for 'D' a very large number of tuples will be read.

I do not know how exactly how pg handles this internally, however while 
skimming the oracle tuning pocket guide I picked up for 2 euros I 
noticed that it mentioned that since oracle 9i bound parameter values 
are evaluated before the execution plan is determined.

Maybe I am mixing up separate concepts (are bound variables and prepared 
statements different concepts?) here. I also do not really understand if 
that means that oracle does not store a query plan for a prepared query 
or if it just does some special handling in case it knows that a 
prepared statement column is known to have a highly varying selectivity 
per value.

regards,
Lukas

Re: Automatic function replanning

From

Bruce Momjian

Date:

22 December 2005, 16:53:19

Lukas Smith wrote:
> Jim C. Nasby wrote:
> 
> > Now, if both of these are done using a prepared statement, it's going to
> > look like:
> > 
> > SELECT * FROM queue WHERE status='?';
> > 
> > If the first one to run is the queue processing one, the planner will
> > probably choose the index. This means that when we're searching on 'N',
> > there will be a fairly small number of tuples read to execute the query,
> > but when searching for 'D' a very large number of tuples will be read.
> 
> I do not know how exactly how pg handles this internally, however while 
> skimming the oracle tuning pocket guide I picked up for 2 euros I 
> noticed that it mentioned that since oracle 9i bound parameter values 
> are evaluated before the execution plan is determined.
> 
> Maybe I am mixing up separate concepts (are bound variables and prepared 
> statements different concepts?) here. I also do not really understand if 
> that means that oracle does not store a query plan for a prepared query 
> or if it just does some special handling in case it knows that a 
> prepared statement column is known to have a highly varying selectivity 
> per value.

What the Oralce manual means I think is that the plan of the query is
delayed until the _first_ EXECUTE, so it has some values to use in the
optimizer.  The problem is that later queries might use constants of
greatly different cardinality.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: Automatic function replanning

From

Lukas Smith

Date:

22 December 2005, 16:55:19

Bruce Momjian wrote:

>> Maybe I am mixing up separate concepts (are bound variables and prepared 
>> statements different concepts?) here. I also do not really understand if 
>> that means that oracle does not store a query plan for a prepared query 
>> or if it just does some special handling in case it knows that a 
>> prepared statement column is known to have a highly varying selectivity 
>> per value.
> 
> What the Oralce manual means I think is that the plan of the query is
> delayed until the _first_ EXECUTE, so it has some values to use in the
> optimizer.  The problem is that later queries might use constants of
> greatly different cardinality.

ok .. which just goes to tell to not use prepared statements for a 
column with highly varying selectivity ..?

or is there a realistic shot at fixing this use case?

regards,
Lukas

Re: Automatic function replanning

From

"Jim C. Nasby"

Date:

22 December 2005, 17:10:57

On Thu, Dec 22, 2005 at 09:55:14PM +0100, Lukas Smith wrote:
> Bruce Momjian wrote:
> 
> >>Maybe I am mixing up separate concepts (are bound variables and prepared 
> >>statements different concepts?) here. I also do not really understand if 
> >>that means that oracle does not store a query plan for a prepared query 
> >>or if it just does some special handling in case it knows that a 
> >>prepared statement column is known to have a highly varying selectivity 
> >>per value.
> >
> >What the Oralce manual means I think is that the plan of the query is
> >delayed until the _first_ EXECUTE, so it has some values to use in the
> >optimizer.  The problem is that later queries might use constants of
> >greatly different cardinality.
> 
> ok .. which just goes to tell to not use prepared statements for a 
> column with highly varying selectivity ..?
> 
> or is there a realistic shot at fixing this use case?

FWIW, I believe that 10g has some brains in this regard, where it can
detect if it should store multiple plans for one prepared statement.
This is critical for them, because they'r parser/planner is much harder
on the system than ours is.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Automatic function replanning

From

Bruce Momjian

Date:

22 December 2005, 17:11:35

Lukas Smith wrote:
> Bruce Momjian wrote:
> 
> >> Maybe I am mixing up separate concepts (are bound variables and prepared 
> >> statements different concepts?) here. I also do not really understand if 
> >> that means that oracle does not store a query plan for a prepared query 
> >> or if it just does some special handling in case it knows that a 
> >> prepared statement column is known to have a highly varying selectivity 
> >> per value.
> > 
> > What the Oralce manual means I think is that the plan of the query is
> > delayed until the _first_ EXECUTE, so it has some values to use in the
> > optimizer.  The problem is that later queries might use constants of
> > greatly different cardinality.
> 
> ok .. which just goes to tell to not use prepared statements for a 
> column with highly varying selectivity ..?
> 
> or is there a realistic shot at fixing this use case?

It is an issue for all databases.  We gave a TODO about it:
* Flush cached query plans when the dependent objects change,  when the cardinality of parameters changes dramatically,
or when new ANALYZE statistics are available
 


--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: Automatic function replanning

From

Lukas Smith

Date:

22 December 2005, 17:14:20

Bruce Momjian wrote:

> It is an issue for all databases.  We gave a TODO about it:
> 
>     * Flush cached query plans when the dependent objects change,
>       when the cardinality of parameters changes dramatically, or
>       when new ANALYZE statistics are available

Ok, just so I understand this correctly. In the mentioned case the 
cardinality does not really change in regards to the table stats, its 
just thatI happen to use a value that has a different selectivity and 
therefore I may need a different plan. So I do not really see how this 
use case is handled with the above todo, nor do I really see how its 
handled with what Jim suggested earlier. The fact of the matter is that 
for this use case you need to use different query plans for the same 
prepared statements.

regards,
Lukas

Re: Automatic function replanning

From

"Jim C. Nasby"

Date:

22 December 2005, 17:18:16

Well, not just rows; total tuples, both base heap and index. ISTM that
would be a better metric than just plain rows read out of base or rows
returned.

Depending on how far down this road we want to go, this would allow for
detecting what parameter values require different query plans, and then
using different query plans for different sets of values. Simply
invalidating the cached plan means you could potentially end up needing
to re-plan very frequently. But given the current speed of our
optimizer, it's probably not worth going to this extent.

Another concern I have is: is cardinality the only metric we need to
look at when deciding to re-plan or are there others?

In either case, my guess is that tracking the info needed to make this
idea happen is probably much easier than doing automatic plan
invalidation based on cardinality, so it would be a useful interum step.
But if we could actually get cardinality invalidation into 8.2, I'd say
put the effort into that...

On Thu, Dec 22, 2005 at 03:14:09PM -0500, Bruce Momjian wrote:
> 
> Oh, OK, so you are logging prepared queries where the plan generates a
> significantly different number of rows from previous runs.  I am not
> sure why that is better, or easier, than just invalidating the  cached
> plan if the cardinality changes.
> 
> ---------------------------------------------------------------------------
> 
> Jim C. Nasby wrote:
> > On Wed, Dec 21, 2005 at 11:00:31PM -0500, Bruce Momjian wrote:
> > > > Track normal resource consumption (ie: tuples read) for planned queries
> > > > and record parameter values that result in drastically different
> > > > resource consumption.
> > > > 
> > > > This would at least make it easy for admins to identify prepared queries
> > > > that have a highly variable execution cost.
> > > 
> > > We have that TODO already:
> > > 
> > >     * Log statements where the optimizer row estimates were dramatically
> > >       different from the number of rows actually found?
> > 
> > Does the stored plan also save how many rows were expected? Otherwise
> > I'm not sure how that TODO covers it... If it does then please ignore my
> > ramblings below. :)
> > 
> > My idea has nothing to do with row estimates. It has to do with the
> > amount of work actually done to perform a query. Consider this example:
> > 
> > CREATE TABLE queue (status char NOT NULL, queue_item text NOT NULL);
> > CREATE INDEX queue__status ON queue (status);
> > 
> > Obviously, to process this you'll need a query like:
> > SELECT * FROM queue WHERE status='N' -- N for New;
> > 
> > Say you also occasionally need to see a list of items that have been
> > processed:
> > SELECT * FROM queue WHERE status='D' -- D for Done;
> > 
> > And let's say you need to keep done items around for 30 days.
> > 
> > Now, if both of these are done using a prepared statement, it's going to
> > look like:
> > 
> > SELECT * FROM queue WHERE status='?';
> > 
> > If the first one to run is the queue processing one, the planner will
> > probably choose the index. This means that when we're searching on 'N',
> > there will be a fairly small number of tuples read to execute the query,
> > but when searching for 'D' a very large number of tuples will be read.
> > 
> > What I'm proposing is to keep track of the 'normal' number of tuples
> > read when executing a prepared query, and logging any queries that are
> > substantially different. So, if you normally have to read 50 tuples to
> > find all 'N' records, when the query looking for 'D' records comes along
> > and has to read 5000 tuples instead, we want to log that. Probably the
> > easiest way to accomplish this is to store a moving average of tuples
> > read with each prepared statement entry.
> > -- 
> > Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
> > Pervasive Software      http://pervasive.com    work: 512-231-6117
> > vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461
> > 
> > ---------------------------(end of broadcast)---------------------------
> > TIP 6: explain analyze is your friend
> > 
> 
> -- 
>   Bruce Momjian                        |  http://candle.pha.pa.us
>   pgman@candle.pha.pa.us               |  (610) 359-1001
>   +  If your life is a hard drive,     |  13 Roberts Road
>   +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
> 
>                http://www.postgresql.org/docs/faq
> 

-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Automatic function replanning

From

"Jim C. Nasby"

Date:

22 December 2005, 17:20:56

On Thu, Dec 22, 2005 at 10:14:15PM +0100, Lukas Smith wrote:
> Ok, just so I understand this correctly. In the mentioned case the 
> cardinality does not really change in regards to the table stats, its 
> just thatI happen to use a value that has a different selectivity and 
> therefore I may need a different plan. So I do not really see how this 
> use case is handled with the above todo, nor do I really see how its 
> handled with what Jim suggested earlier. The fact of the matter is that 
> for this use case you need to use different query plans for the same 
> prepared statements.

What I mentioned would allow for identifying query plans that this is
happening on. Doing something about it would be the next step after
that.

Can anyone think of something other than selectivity that would make a
stored query plan go bad based soley on the parameters being fed into it?
(In other words ignore the obvious cases of bad statistics or a DDL
change).
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Automatic function replanning

From

Bruce Momjian

Date:

22 December 2005, 17:22:18

We need invalidation anyway, so I don't see why an intermediate step
makes sense.

---------------------------------------------------------------------------

Jim C. Nasby wrote:
> Well, not just rows; total tuples, both base heap and index. ISTM that
> would be a better metric than just plain rows read out of base or rows
> returned.
> 
> Depending on how far down this road we want to go, this would allow for
> detecting what parameter values require different query plans, and then
> using different query plans for different sets of values. Simply
> invalidating the cached plan means you could potentially end up needing
> to re-plan very frequently. But given the current speed of our
> optimizer, it's probably not worth going to this extent.
> 
> Another concern I have is: is cardinality the only metric we need to
> look at when deciding to re-plan or are there others?
> 
> In either case, my guess is that tracking the info needed to make this
> idea happen is probably much easier than doing automatic plan
> invalidation based on cardinality, so it would be a useful interum step.
> But if we could actually get cardinality invalidation into 8.2, I'd say
> put the effort into that...
> 
> On Thu, Dec 22, 2005 at 03:14:09PM -0500, Bruce Momjian wrote:
> > 
> > Oh, OK, so you are logging prepared queries where the plan generates a
> > significantly different number of rows from previous runs.  I am not
> > sure why that is better, or easier, than just invalidating the  cached
> > plan if the cardinality changes.
> > 
> > ---------------------------------------------------------------------------
> > 
> > Jim C. Nasby wrote:
> > > On Wed, Dec 21, 2005 at 11:00:31PM -0500, Bruce Momjian wrote:
> > > > > Track normal resource consumption (ie: tuples read) for planned queries
> > > > > and record parameter values that result in drastically different
> > > > > resource consumption.
> > > > > 
> > > > > This would at least make it easy for admins to identify prepared queries
> > > > > that have a highly variable execution cost.
> > > > 
> > > > We have that TODO already:
> > > > 
> > > >     * Log statements where the optimizer row estimates were dramatically
> > > >       different from the number of rows actually found?
> > > 
> > > Does the stored plan also save how many rows were expected? Otherwise
> > > I'm not sure how that TODO covers it... If it does then please ignore my
> > > ramblings below. :)
> > > 
> > > My idea has nothing to do with row estimates. It has to do with the
> > > amount of work actually done to perform a query. Consider this example:
> > > 
> > > CREATE TABLE queue (status char NOT NULL, queue_item text NOT NULL);
> > > CREATE INDEX queue__status ON queue (status);
> > > 
> > > Obviously, to process this you'll need a query like:
> > > SELECT * FROM queue WHERE status='N' -- N for New;
> > > 
> > > Say you also occasionally need to see a list of items that have been
> > > processed:
> > > SELECT * FROM queue WHERE status='D' -- D for Done;
> > > 
> > > And let's say you need to keep done items around for 30 days.
> > > 
> > > Now, if both of these are done using a prepared statement, it's going to
> > > look like:
> > > 
> > > SELECT * FROM queue WHERE status='?';
> > > 
> > > If the first one to run is the queue processing one, the planner will
> > > probably choose the index. This means that when we're searching on 'N',
> > > there will be a fairly small number of tuples read to execute the query,
> > > but when searching for 'D' a very large number of tuples will be read.
> > > 
> > > What I'm proposing is to keep track of the 'normal' number of tuples
> > > read when executing a prepared query, and logging any queries that are
> > > substantially different. So, if you normally have to read 50 tuples to
> > > find all 'N' records, when the query looking for 'D' records comes along
> > > and has to read 5000 tuples instead, we want to log that. Probably the
> > > easiest way to accomplish this is to store a moving average of tuples
> > > read with each prepared statement entry.
> > > -- 
> > > Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
> > > Pervasive Software      http://pervasive.com    work: 512-231-6117
> > > vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461
> > > 
> > > ---------------------------(end of broadcast)---------------------------
> > > TIP 6: explain analyze is your friend
> > > 
> > 
> > -- 
> >   Bruce Momjian                        |  http://candle.pha.pa.us
> >   pgman@candle.pha.pa.us               |  (610) 359-1001
> >   +  If your life is a hard drive,     |  13 Roberts Road
> >   +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
> > 
> > ---------------------------(end of broadcast)---------------------------
> > TIP 3: Have you checked our extensive FAQ?
> > 
> >                http://www.postgresql.org/docs/faq
> > 
> 
> -- 
> Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
> Pervasive Software      http://pervasive.com    work: 512-231-6117
> vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: Automatic function replanning

From

Lukas Kahwe Smith

Date:

22 December 2005, 17:28:23

Bruce Momjian wrote:

> Right, if the cardinality changes, you realize this before execution and
> optimize/save the plan again.  A further optimization would be to save
> _multiple_ plans for a single prepared plan based on constants and
> choose one of the other, but that is beyond where we are willing to
> consider at this stage, I think.

ok .. so you store the cardinality that was used when generating the
original plan. on the next execution you look up the cardinality again
and compare it, if its off too much, you replan. however this could in
extreme cases mean that you replan on every execution and thereby
killing off the entire advantage of storing the plan. but thats the
absolute worse case scenario.

regards,
Lukas

PS: bruce original email was only send to me directly ..

Re: Automatic function replanning

From

Trent Shipley

Date:

22 December 2005, 19:02:03

On Thursday 2005-12-22 14:28, Lukas Kahwe Smith wrote:
> Bruce Momjian wrote:
> > Right, if the cardinality changes, you realize this before execution and
> > optimize/save the plan again.  A further optimization would be to save
> > _multiple_ plans for a single prepared plan based on constants and
> > choose one of the other, but that is beyond where we are willing to
> > consider at this stage, I think.
>
> ok .. so you store the cardinality that was used when generating the
> original plan. on the next execution you look up the cardinality again
> and compare it, if its off too much, you replan. however this could in
> extreme cases mean that you replan on every execution and thereby
> killing off the entire advantage of storing the plan. but thats the
> absolute worse case scenario.
>
> regards,
> Lukas
>
> PS: bruce original email was only send to me directly ..

So you have a parameterized query (one parameter for simplicity of argument), 
as the parameter changes, cardinality changes dramatically.

It seems to me that in this case better than replanning is building a data 
structure that associates different parameter values with appropriate plans.  
The plans can be reused until, as would be the case with an no-parameter 
query, a parameter specific plan should be flushed (or the entire family of 
plans can be flushed).

Re: Automatic function replanning

From

Bruce Momjian

Date:

22 December 2005, 19:06:01

Trent Shipley wrote:
> On Thursday 2005-12-22 14:28, Lukas Kahwe Smith wrote:
> > Bruce Momjian wrote:
> > > Right, if the cardinality changes, you realize this before execution and
> > > optimize/save the plan again.  A further optimization would be to save
> > > _multiple_ plans for a single prepared plan based on constants and
> > > choose one of the other, but that is beyond where we are willing to
> > > consider at this stage, I think.
> >
> > ok .. so you store the cardinality that was used when generating the
> > original plan. on the next execution you look up the cardinality again
> > and compare it, if its off too much, you replan. however this could in
> > extreme cases mean that you replan on every execution and thereby
> > killing off the entire advantage of storing the plan. but thats the
> > absolute worse case scenario.
> >
> > regards,
> > Lukas
> >
> > PS: bruce original email was only send to me directly ..
> 
> So you have a parameterized query (one parameter for simplicity of argument), 
> as the parameter changes, cardinality changes dramatically.
> 
> It seems to me that in this case better than replanning is building a data 
> structure that associates different parameter values with appropriate plans.  
> The plans can be reused until, as would be the case with an no-parameter 
> query, a parameter specific plan should be flushed (or the entire family of 
> plans can be flushed).

TODO updated:* Flush cached query plans when the dependent objects change,  when the cardinality of parameters changes
dramatically,or  when new ANALYZE statistics are available  A more complex solution would be to save multiple plans for
different cardinality and use the appropriate plan based on the EXECUTE values.
 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073