Re: eWeek Poll: Which database is most critical to your - Mailing list pgsql-hackers
From | Hannu Krosing |
---|---|
Subject | Re: eWeek Poll: Which database is most critical to your |
Date | |
Msg-id | 1014787941.2127.64.camel@rh72.home.ee Whole thread Raw |
In response to | Re: eWeek Poll: Which database is most critical to your (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
On Wed, 2002-02-27 at 10:39, Tom Lane wrote: > Neil Conway <nconway@klamath.dyndns.org> writes: > > When processing INSERTs, UPDATEs and DELETEs, check if the query > > would affect any of the tables for which we are maintaing this cache. If > > so, flush the cache. This ensures that we will never return invalid > > results. > > Note that this would imply that the cache is *global* across all > backends; therefore it is a shared data structure and hence an access > bottleneck. (Not to mention a memory-management headache, since the > size of shared memory can't easily be varied on-the-fly.) I think that it would be enough if the hashes and bookkeeping info (tmin,tmax,filename) were in a global table. We could also purge all data more than a few minutes old. We also need an inverse lookup from changed table to cached query for cache invalidation on insert/update/delete The result could be even saved in temp files and be mostly faster than doing the full parse/plan/execute, both for complex queries returning a few rows (saves planning time) or many rows (saves execute time). The format used for saving should be exact wire protocol, so that efficient system calls could be used where available (linux 2.4+ has a system call that will transfer a whole file to a socket in one call bypassing all copying and cacheing) The lookup part will be pretty trivial - lookup using hash, check for tmin/tmax, if ok push cached result out to client.This will make us as fast or faster than MySQL for trivial_benchmark/busy_website case. The cache creation/maintenance part will be much trickier - When creating cache * the tables affected can be determined only from fully built plans because of possible rule expansions. * if there is a trigger on select for this query it can't be cached * put some temporary insert/update/delete triggers on all real tables used in query that will invalidate cache - as an alternativelywe could always run the invalidate-query-cache code for affected table on insert/update/delete on a table ifexact caching is enabled * invalidate cache on schema changes * run a periodic check and invalidate old cache entries. Some of the above could also be needed for caching query plans. > I cannot believe that caching results for literally-identical queries > is a win, except perhaps for the most specialized (read brain dead) > applications. Actually a web app that looks up contents of a 5 row combobox is not really brain-dead. Doing all the caching (and cache invalidation) on client side is hard and fragle - what happens when someone adds a trigger in backend ? > Has anyone looked at the details of the test case that > MySQL uses to claim that this is a good idea? Has it got any similarity > to your own usage patterns? Yes - for content management / web apps. No - for bean-counting apps. > We have talked about caching query plans for suitably-parameterized > queries, but even that seems unlikely to be a huge win; at least I'd > not think it useful to try to drive the cache totally automatically. > If an application could say "here's a query I expect to use a lot, > varying these specific parameters" then caching a plan for that would > make sense. > > Now, there are notions of "prepared statements" in many access APIs > that fit this description, and in fact the underlying capability exists > in the backend --- we've just not gotten around to building the > interfaces to tie it all together. *That* would be worth working on. Sure. It would a) make many queries faster b) make client libs (ODBC/JDBC/ECPG) faster and simpler by not forcing them to fake it. But there is also a big class of applications that would benefit much more from caching exact queries. And it will make us as fast as MySQL for 100000 consecutive calls of SELECT MAX(N) FROM T ;) --------------- Hannu
pgsql-hackers by date: