Re: Caching of Queries - Mailing list pgsql-performance
From | Aaron Werman |
---|---|
Subject | Re: Caching of Queries |
Date | |
Msg-id | BAY18-DAV6m5UxP9Fek0002fd87@hotmail.com Whole thread Raw |
In response to | Caching of Queries (Scott Kirkwood <scottakirkwood@gmail.com>) |
Responses |
Re: Caching of Queries
|
List | pgsql-performance |
From: "Scott Kirkwood" <scottakirkwood@gmail.com> > On Mon, 27 Sep 2004 15:03:01 +1000, Neil Conway <neilc@samurai.com> wrote: > > I think the conclusion of past discussions about this feature is that > > it's a bad idea. Last I checked, MySQL has to clear the *entire* query > > cache when a single DML statement modifying the table in question is > > issued. Not to mention that the feature is broken for non-deterministic > > queries (like now(), ORDER BY random(), or nextval('some_seq'), and so > > on). That makes the feature close to useless for a lot of situations, > > albeit not every situation. Only the cache of changed tables are cleared. MySQL sanely doesn't cache statements with unstable results. The vast majority of statements are stable. The feature is likely to dramatically improve performance of most applications; ones with lots of queries are obvious, but even data warehouses have lots of (expensive) repetitious queries against static data. > > I think it's important to demark three levels of possible caching: > 1) Caching of the parsed query tree > 2) Caching of the query execute plan > 3) Caching of the query results > > I think caching the query results (3) is pretty dangerous and > difficult to do correctly. I think it's very hard to cache results on the client side without guidance because it is expensive to notify the client of change events. A changing table couldn't be cached on client side without a synchronous check to the db - defeating the purpose. Guidance should work, though - I also think an optional client configuration table which specified static tables would work and the cost of a sparse XOR hash of statements to find match candidate statements would be negligible. The list of tables would be a contract that they won't change. The fact is that there often are a lot of completely static tables in high volume transaction systems, and the gain of SQUID style proxying could be an enormous performance gain (effort, network overhead, latency, DB server cont ext switching, ...) especially in web farm and multi tiered applications (and middleware doing caching invests so many cycles to do so). Caching results on the server would also dramatically improve performance of high transaction rate applications, but less than at the client. The algorithm of only caching small result sets for tables that haven't changed recently is trivial, and the cost of first pass filtering of candidate statements to use a cache result through sparse XOR hashes is low. The statement/results cache would need to be invalidated when any referenced table is changed. This option seems like a big win. > > Caching of the the execute plan (2) is not dangerous but may actually > execute more slowly by caching a bad plan (i.e. a plan not suited to > the current data) This concern could be resolved by aging plans out of cache. This concern relates to an idiosyncrasy of pg, that vacuum has such a profound effect. Anyone who has designed very high transaction rate systems appreciates DB2 static binding, where a plan is determined and stored in the database, and precompiled code uses those plans - and is both stable and free of plan cost. The fact is that at a high transaction rate, we often see the query parse and optimization as the most expensive activity. The planner design has to be "dumbed down" to reduce overhead (and even forced to geqo choice). The common development philosophy in pg is expecting explicit prepares and executes against bind variables (relatively rare, but useful in high volume situations), and otherwise (commonly) using explicit literals in statements. The problem here is the prepare/execute only works in monolithic applications, and the chance of reuse of SQL statements with literals is much lower. (On a blue sky note, I would love to see a planner that dynamically changed search depth of execution paths, so it could exhaustively build best plans at low usage times and be less sophisticated when the load was higher... or better yet, try alternatively for very high transaction frequency plans until it found the best one in practice! The identified correct plan would be used subsequently.) > > Caching of the query tree (1) to me has very little downsides (except > extra coding). But may not have a lot of win either, depending how > much time/resources are required to parse the SQL and lookup the > objects in the system tables (something I've never gotten a > satisfactory answer about). Also, some of the query cache would have > to be cleared when DDL statements are performed. Parse cache is obviously easy - just store the parse tree with a hash and the SQL string. This would only help some very specific types of transaction mixes. The issue is why go through all this trouble without caching the plan? The same issues exist in both - the cost of matching, the need to invalidate if objects definitions change, but the win would be so much less. > > -Scott > > ---------------------------(end of broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings >
pgsql-performance by date: