Thread: invalidating cached plans
PostgreSQL should invalidate a cached query plan when one of the objects the plan depends upon is modified. This is the common case of a more general problem: a query plan depends on various parts of the environment at plan-creation time. That environment includes the definitions of database objects, but also GUC variables (most importantly search_path, but also optimizer-tuning variables for example), the state of database statistics, and so on. I'll leave resolution of the more general problem to someone else -- I think if we can manage to invalidate plans automatically when dependent objects change, that's better than nothing. Implementation sketch: - when creating a plan, allow the caller to specify whether dependencies should be tracked or not; we want to track dependencies for long-lived plans like cached plans created by PL/PgSQL, named PREPARE plans (both protocol-level and via SQL), and so forth. We needn't track dependencies for exec_simple_query(), and so on. - to install dependencies for a plan, walk the plan's tree and remember the OIDs of any system objects it references. Both cached plans and their dependencies are backend-local. - if we receive a shared invalidation message for a relation referenced by a plan, mark the plan as invalid (a new boolean flag associated with a prepared Plan). If the sinval queue overflows, mark all plans as invalid (well, all the plans we're tracking dependencies for, anyway). I haven't looked too closely at whether the existing sinval message types will be sufficient for invalidating cached plans; some modifications might be needed. - it is the responsibility of the call site managing the prepared plan to check whether a previously prepared plan is invalid or not -- and to take the necessary steps to replan it when needed. Comments welcome. -Neil
Neil Conway <neilc@samurai.com> writes: > PostgreSQL should invalidate a cached query plan when one of the objects > the plan depends upon is modified. Agreed. > Implementation sketch: I would like to see this folded together with creation of a centralized plan caching module. We currently have ad-hoc plan caches in ri_triggers.c, plpgsql, prepare.c, and probably other places. There is no good reason to keep reinventing that wheel, especially not given that plan invalidation raises the complexity of the wheel by a considerable amount. > - when creating a plan, allow the caller to specify whether dependencies > should be tracked or not; I would prefer not to tie this behavior to plan creation per se, but to plan caching. And in a cached plan there is no "don't track" option. HOWEVER, see next comment ... > - to install dependencies for a plan, walk the plan's tree and remember > the OIDs of any system objects it references. The difficulty with this after-the-fact approach is that the finished plan tree may contain no reference to some objects that it in fact depends on. SQL functions that have been inlined are the classic example, but consider also the idea that a plan may have been made on the strength of a table constraint (see nearby thread about partitioning) and needs to be invalidated if that constraint goes away. One possible approach is to do the invalidation on a sufficiently coarse grain that we don't care. For example, I would be inclined to make any change in a table's schema invalidate all plans that use that table at all; that would then subsume the constraint problem for instance. This doesn't solve the inlined function problem however. For inlined functions, the only answer I see is for the planner to somehow decorate the plan tree with a list of things it consulted even though they might not be directly referenced in the finished plan. > Both cached plans and their dependencies are backend-local. Agreed. > - it is the responsibility of the call site managing the prepared plan > to check whether a previously prepared plan is invalid or not -- and to > take the necessary steps to replan it when needed. Again, I'd rather see that folded into a central plan cache mechanism. regards, tom lane
Tom Lane wrote: > I would like to see this folded together with creation of a centralized > plan caching module. Interesting. Can you elaborate on how you'd envision call sites making use of this module? > The difficulty with this after-the-fact approach is that the finished > plan tree may contain no reference to some objects that it in fact > depends on. SQL functions that have been inlined are the classic > example, but consider also the idea that a plan may have been made on > the strength of a table constraint (see nearby thread about partitioning) > and needs to be invalidated if that constraint goes away. Hmm, good point. I'm happy to blow away all cached plans when a table constraint changes, so that resolves that, but I agree we'll need to handle inlined functions specially. But perhaps it is best to not rely on after-the-fact Plan analysis at all, and build the capability to record plan dependencies directly into the planner. -Neil
Neil Conway <neilc@samurai.com> writes: > Tom Lane wrote: >> I would like to see this folded together with creation of a centralized >> plan caching module. > Interesting. Can you elaborate on how you'd envision call sites making > use of this module? I hadn't really gotten as far as working out a reasonable API for the module. The $64 question seems to be what is the input: a textual query string, a raw parse analysis tree, or what? And what sort of key does the caller want to use to re-find a previously cached plan? Probably the first thing to do is look around at the plausible users of this thing and see what they'd find most convenient. regards, tom lane
On Mon, Mar 14, 2005 at 02:53:36AM -0500, Tom Lane wrote: > Probably the first thing to do is look around at the plausible users of > this thing and see what they'd find most convenient. This may be totally irrelevant: Our current load distributors, like pgpool, have no way of knowing the side effects of backend functions. It would be interesting if the client could send each potential query to the master saying, "execute this query if there are side effects, otherwise do no operation and and let me execute this read-only query on a replicated copy." -Mike
Michael Adler <adler@pobox.com> writes: > Our current load distributors, like pgpool, have no way of knowing the > side effects of backend functions. It would be interesting if the > client could send each potential query to the master saying, "execute > this query if there are side effects, otherwise do no operation and > and let me execute this read-only query on a replicated copy." Wouldn't you want to handle that the other way around? I mean there's not much point in distributing the load if it still requires passing everything through a single point of contention anyways. So I think the feature you really want is a kind of read-only mode. "execute this but if it tries to have any side effects abort and give me an error" That seems like a reasonably useful thing for other circumstances as well. DBAs sanity checking a database that don't want to make any modifications, low privilege users like cron jobs that aren't supposed to be making modifications, etc. In an ideal world it would combine well with having tablespaces be on read-only media. I had the impression Postgres wants to make modifications to data for purely read-only operations though. It might be hard to detect "side effects" that the user would care about distinct from invisible internal operations. -- greg
In article <6028.1110785150@sss.pgh.pa.us>, Tom Lane <tgl@sss.pgh.pa.us> writes: > One possible approach is to do the invalidation on a sufficiently coarse > grain that we don't care. For example, I would be inclined to make any > change in a table's schema invalidate all plans that use that table at > all; that would then subsume the constraint problem for instance. This > doesn't solve the inlined function problem however. How about using an even coarser grain? Whenever something in the database in question changes, blindly throw away all cached plans for this DB.
Harald Fuchs wrote: > In article <6028.1110785150@sss.pgh.pa.us>, > Tom Lane <tgl@sss.pgh.pa.us> writes: > > > One possible approach is to do the invalidation on a sufficiently coarse > > grain that we don't care. For example, I would be inclined to make any > > change in a table's schema invalidate all plans that use that table at > > all; that would then subsume the constraint problem for instance. This > > doesn't solve the inlined function problem however. > > How about using an even coarser grain? Whenever something in the > database in question changes, blindly throw away all cached plans for > this DB. We could, but the creation of a single temp table would invalidate all caches, and temp table creation might be pretty frequent. One idea would be to record if the function uses non-temp tables, temp tables, or both, and invalidate based on the type of table being invalidated, rather than the table name itself. I can imagine this hurting temp table caching, but at least functions using regular tables would not be affected, and functions using temp tables would work reliably. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian wrote: > One idea would be to record if the function uses non-temp tables, temp > tables, or both, and invalidate based on the type of table being > invalidated, rather than the table name itself. I can imagine this > hurting temp table caching, but at least functions using regular tables > would not be affected, and functions using temp tables would work > reliably. It seems to me it's not _that_ difficult to invalidate plans at a more granular level (based on the individual database objects they depend upon). Inlined functions need to be handled, but that is doable -- it just needs some work. The efficiency win of not needlessly throwing away cached plans is worth investing some effort, I think. -Neil
"Harald Fuchs" <use_reply_to@protecting.net> writes > How about using an even coarser grain? Whenever something in the > database in question changes, blindly throw away all cached plans for > this DB. > If we clearly define what is "something in database in question", we have to trace all the objects the query will touch. There are two difficulities: First, even if we can trace all the changes to the objects we will touch, it is still difficult to differenciate what changes do not invalidate the plan, what do. For instance, if random() function changes its behavior in two ways, (1) change its returned precision, then there is no problem of our plan; (2) change its distribution, then it might be a problem of our plan. A fast solution to this problem is to discard all the plans once the referencing object changes (no matter what change). Second (as Tom says), some changes can hardly be traced. For example, we only use function A. But function A cites function B, function B cites function C. when C changes, how do we know that we should worry about our plan? Maybe we not only need caller-graph, we also need callee-graph. But I am afraid this will be a big cost. A fast solution is that we forbidden some kind of query to be cached. Regards, Qingqing > > ---------------------------(end of broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings >
> This may be totally irrelevant: > > Our current load distributors, like pgpool, have no way of knowing the > side effects of backend functions. It would be interesting if the > client could send each potential query to the master saying, "execute > this query if there are side effects, otherwise do no operation and > and let me execute this read-only query on a replicated copy." You can go 'SET TRANSACTION READ ONLY;' or something... Chris
On Mon, 2005-03-14 at 20:06 -0500, Bruce Momjian wrote: > Harald Fuchs wrote: > > In article <6028.1110785150@sss.pgh.pa.us>, > > Tom Lane <tgl@sss.pgh.pa.us> writes: > > > > > One possible approach is to do the invalidation on a sufficiently coarse > > > grain that we don't care. For example, I would be inclined to make any > > > change in a table's schema invalidate all plans that use that table at > > > all; that would then subsume the constraint problem for instance. This > > > doesn't solve the inlined function problem however. > > > > How about using an even coarser grain? Whenever something in the > > database in question changes, blindly throw away all cached plans for > > this DB. > > We could, but the creation of a single temp table would invalidate all > caches, and temp table creation might be pretty frequent. > One idea would be to record if the function uses non-temp tables, temp > tables, or both, and invalidate based on the type of table being > invalidated, rather than the table name itself. I can imagine this > hurting temp table caching, but at least functions using regular tables > would not be affected, and functions using temp tables would work > reliably. Too coarse I think, especially with schemas being considered user workspaces where they are free to add or modify their structures as they like (for maintenance, reports, temporary storage, etc.) but there are a significant number of prepared statements in the controlled segments of the database. This would cause the system to hiccup fairly regularly still when a couple hundred connections are forced to replan their queries. --
Qingqing Zhou wrote: > Second (as Tom says), some changes can hardly be traced. For example, we > only use function A. But function A cites function B, function B cites > function C. when C changes, how do we know that we should worry about our > plan? I don't see that this is a major problem. If a plan A invokes a function B, then changes to B will need to invalidate A; that should be pretty easy to arrange. If B is a PL/PgSQL function that invokes a function C, it will probably cache a plan involving C. But when C changes, we need only flush B's cached plan, _not_ A -- as far as A is concerned, the operation of B is a blackbox. The only exception is when B is a SQL function that is inlined, but we can handle that separately. Regarding performance, the important point is that a DDL command "pushes" changes out to backends to invalidate cached plans -- a plan doesn't need to poll to see if there have been any changes to objects it depends upon. And on a production system, DDL should usually be infrequent (the primary exception is temp table creation/destruction, but we can potentially optimize for that since it is backend-local). Or am I missing your point? -Neil
Neil Conway wrote: > - it is the responsibility of the call site managing the prepared plan > to check whether a previously prepared plan is invalid or not -- and to > take the necessary steps to replan it when needed. Does this mean that clients that use PREPARE/Parse need to handle "plan invalidated" as a possible response to EXECUTE/Bind, or will the backend keep the query string / parse tree around and replan on next execution? -O
Oliver Jowett wrote: > Does this mean that clients that use PREPARE/Parse need to handle "plan > invalidated" as a possible response to EXECUTE/Bind, or will the backend > keep the query string / parse tree around and replan on next execution? The latter -- the client won't be aware that replanning took place. (If your prepared queries take minutes of planning time, perhaps this is something you *would* like to be made aware of, however...) -Neil
"Neil Conway" <neilc@samurai.com> writes > I don't see that this is a major problem. If a plan A invokes a function > B, then changes to B will need to invalidate A; that should be pretty > easy to arrange. If B is a PL/PgSQL function that invokes a function C, > it will probably cache a plan involving C. But when C changes, we need > only flush B's cached plan, _not_ A -- as far as A is concerned, the > operation of B is a blackbox. This is the key point (say this is point_1) - we must make sure how deep we have to go to check validity. So if the plan of A will not reply on any result information of B, say returned/affected row count of B, then it is ok. > The only exception is when B is a SQL > function that is inlined, but we can handle that separately. I don't quite understand the difference between a SQL function and a PL/PgSQL function here - since there is a overlapped functionality that we could implement by SQL function or by PL/PgSQL function. > Regarding performance, the important point is that a DDL command > "pushes" changes out to backends to invalidate cached plans -- a plan > doesn't need to poll to see if there have been any changes to objects it > depends upon. And on a production system, DDL should usually be > infrequent (the primary exception is temp table creation/destruction, > but we can potentially optimize for that since it is backend-local). Yes, it is DDL's responsibility to do invalidation, and the query should never worry about the cached plan it will use. So when a DDL comes, it has to know all the objects it affects directly(no need to go deeper, based on point_1), then for each plan in the cache we check if they are directly(based on point_1) related to these changed objects. Regards, Qingqing
Tom Lane wrote: > I hadn't really gotten as far as working out a reasonable API for the > module. The $64 question seems to be what is the input: a textual query > string, a raw parse analysis tree, or what? It should be easy enough to accept either, and then convert from the query string into a raw parse tree. The storage of the parse tree should probably be owned by the cache module, so that might introduce some slight complications -- like exposing a MemoryContext for callers to allocate inside, but it should be doable anyway. > And what sort of key does the caller want to use to re-find a > previously cached plan? Do we want to share plans between call sites? If so, an API like this comes to mind is: struct CachedPlan { List *query_list; List *plan_list; char *query_str; int nargs; Oid *argtypes; int refcnt; /* variousother info -- perhaps memory context? */ }; struct CachedPlan *cache_get_plan(const char *query_str, int nargs, Oid *argtypes); void cache_destroy_plan(struct CachedPlan *plan); Where cache_get_plan() would lookup the query string in a hash table (mapping strings => CachedPlans). If found, it would check if the plan had been invalidated, and would replan it if necessary, then bump its reference count and return it. If not found, it would create a new CachedPlan, parse, rewrite and plan the query string, and return it. This would mean that within a backend we could share planning for queries that happened to be byte-for-byte identical. - it would be nice to do the hash lookup on the result of raw_parser() rather than the query string itself, since we would be able to share more plans that way. Not sure if that's worth doing, though. - how do we manage storage? The reference counting above is off-the-cuff. Perhaps there's a better way to do this... (Of course, if a plan has refcnt > 0, we can still remove it from memory if needed, since any call site should provide sufficient information to reconstruct it) This would also make it somewhat more plausible to share the query cache among backends, but I'm not interested in pursuing that right now. (BTW, another thing to consider is how the rewriter will effect a plan's dependencies: I think we should probably invalidate a plan when a modification is made to a view or rule that affected the plan. This should also be doable, though: we could either modify the rewriter to report these dependencies, or trawl the system catalogs looking for rules that apply to any of the relations in the query. The latter method would result in spurious invalidations, in the case of rules with a WHERE clause.) -Neil
Neil Conway <neilc@samurai.com> writes: > (BTW, another thing to consider is how the rewriter will effect a plan's > dependencies: I think we should probably invalidate a plan when a > modification is made to a view or rule that affected the plan. This issue goes away as long as you follow the rule that any change to a table's schema invalidates all plans that mention the table. Views and tables that have rules will still be mentioned in the rangetable of the resulting plan, even if they aren't part of the active plan. (We use that for access rights checking.) Too tired to consider the other details at the moment... regards, tom lane
Neil Conway <neilc@samurai.com> writes: > Oliver Jowett wrote: >> Does this mean that clients that use PREPARE/Parse need to handle "plan >> invalidated" as a possible response to EXECUTE/Bind, or will the backend >> keep the query string / parse tree around and replan on next execution? > The latter -- the client won't be aware that replanning took place. It seems possible that replanning would fail for some reason, in which case the EXECUTE would get an error of a kind you maybe weren't expecting during EXECUTE. Other than that it seems it should be transparent. regards, tom lane
Qingqing Zhou wrote: > I don't quite understand the difference between a SQL function and a > PL/PgSQL function here - since there is a overlapped functionality that we > could implement by SQL function or by PL/PgSQL function. The difference is between an inlined function (which is integrated directly into the plan of the query that invokes it) and a function that is not inlined. Only SQL functions can be inlined, and only some SQL functions at that. With an out-of-line function, we just invoke the function via the fmgr infrastructure -- if it chooses to create any plans (e.g. via SPI), that is its own business, and they would be treated as distinct plans by the cache module. -Neil
"Qingqing Zhou" <zhouqq@cs.toronto.edu> writes: > This is the key point (say this is point_1) - we must make sure how deep we > have to go to check validity. The recorded plan has to somehow mention all the inlined functions that were expanded out of existence. There might be several levels of such things, but as long as we remember them all for invalidation purposes, I don't see a problem. A related example: an operator might point to an inline-able function. Neither the operator nor the function will appear explicitly in the finished plan tree, but they'd better both be listed in the side list of invalidation dependencies. > I don't quite understand the difference between a SQL function and a > PL/PgSQL function here - The planner doesn't know anything about inlining plpgsql functions. So while the function might have its own invalidation issues to deal with internally, a plan that calls it cannot need invalidation because of that. Obviously these issues depend a lot on the internal behavior of the planner, so we are going to have to fix the planner to record the identity of every object that it looks at without explicitly mentioning it in the final plan. No other part of the system can be expected to track all that. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > Neil Conway <neilc@samurai.com> writes: > > (BTW, another thing to consider is how the rewriter will effect a plan's > > dependencies: I think we should probably invalidate a plan when a > > modification is made to a view or rule that affected the plan. > > This issue goes away as long as you follow the rule that any change to a > table's schema invalidates all plans that mention the table. Views and > tables that have rules will still be mentioned in the rangetable of the > resulting plan, even if they aren't part of the active plan. (We use > that for access rights checking.) That makes me wonder. What happens if I prepare a query, then use SET SESSION AUTHORIZATION to change my user. Then try to execute the query? Should it recheck all the permissions? Or are all my prepared queries credential that I'm acquiring and can use any time? -- greg
Greg Stark <gsstark@mit.edu> writes: > That makes me wonder. What happens if I prepare a query, then use SET SESSION > AUTHORIZATION to change my user. Then try to execute the query? Permissions checks are applied at executor startup, not by the planner, so it should Work Correctly in my view of the world. There is one exception: a potentially inlinable SQL function will be inlined if it is allowably executable when the planner wants to do it --- subsequent revocation of call privileges on the function won't cause a failure of the plan. You could argue it both ways about whether this is a problem, but it seems to me it's not a big issue. We don't inline functions that could, say, give you access to a table you couldn't have read otherwise. regards, tom lane
Neil Conway wrote: > Do we want to share plans between call sites? After thinking about this a little more, I think the answer is "no" -- it doesn't really buy us much, and introduces some extra complications (e.g. resource management). BTW, it's quite annoying that the planner scribbles on its input. I've got half a mind to fix this before doing the rest of the cache invalidation work. That might be quite a large project, however... -Neil
Neil Conway <neilc@samurai.com> writes: > BTW, it's quite annoying that the planner scribbles on its input. Yeah ... it would be good to fix that ... regards, tom lane
On Thu, 2005-03-17 at 16:11 +1100, Neil Conway wrote: > Neil Conway wrote: > > Do we want to share plans between call sites? > > After thinking about this a little more, I think the answer is "no" -- > it doesn't really buy us much, and introduces some extra complications > (e.g. resource management). It was already implemented as experiment and I think better is keep plans separate. karel -- Karel Zak <zakkr@zf.jcu.cz>
In article <6028.1110785150@sss.pgh.pa.us>, Tom Lane <tgl@sss.pgh.pa.us> writes: > One possible approach is to do the invalidation on a sufficiently coarse > grain that we don't care. For example, I would be inclined to make any > change in a table's schema invalidate all plans that use that table at > all; that would then subsume the constraint problem for instance. This > doesn't solve the inlined function problem however. How about making this even more coarse-grained? Blindly throw all cached plans away when something in the database DDL changes.
Harald Fuchs <use_reply_to@protecting.net> writes: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> One possible approach is to do the invalidation on a sufficiently coarse >> grain that we don't care. For example, I would be inclined to make any >> change in a table's schema invalidate all plans that use that table at >> all; that would then subsume the constraint problem for instance. This >> doesn't solve the inlined function problem however. > How about making this even more coarse-grained? Blindly throw all > cached plans away when something in the database DDL changes. Well, the problem is not so much that we can't tell what depends on which, as that we have no mechanism to make plan invalidation happen in the first place. I don't think that "throw everything away" will make it very much simpler. regards, tom lane
"Neil Conway" <neilc@samurai.com> writes > PostgreSQL should invalidate a cached query plan when one of the objects > the plan depends upon is modified. It just comes into my mind that current cache invalidation implementation may need to consider the future query result cache. The question comes like this: for a simple query (i.e., the query without any function call, on a single relation, etc), the result cache is not very difficult to do in my understanding, the sketch is here: (1) result set is only valid for queries within a serializable transaction; (2) result set is reusable if the whole where-condition is matched - for simplicity; (3) discard the cache result is the target relation is updated in the same transaction; We cache ctids or the real tuples. And we develop a new scan method, say T_ResultSetScan on the result set. A problem is araised if the where-condition include a function. Two queries looks the same, but the function they called might be different at this time ... the cached plans invalidation mechanism could help to detect this. Regards, Qingqing
"Qingqing Zhou" <zhouqq@cs.toronto.edu> writes: > "Neil Conway" <neilc@samurai.com> writes >> PostgreSQL should invalidate a cached query plan when one of the objects >> the plan depends upon is modified. > It just comes into my mind that current cache invalidation implementation > may need to consider the future query result cache. There isn't likely ever to be a query result cache. The idea has been proposed before and rejected before: too much complexity and overhead for too little prospective gain. If you need such a thing it makes more sense to do it on the application side and save a network round trip. regards, tom lane