Re: query cache - Mailing list pgsql-hackers
From | Merlin Moncure |
---|---|
Subject | Re: query cache |
Date | |
Msg-id | CAHyXU0ySQV+UA3jcHw_ebxumiEow7ibY890CmwZjoaObAVhY8Q@mail.gmail.com Whole thread Raw |
In response to | Re: query cache (Robert Haas <robertmhaas@gmail.com>) |
List | pgsql-hackers |
On Fri, Mar 23, 2012 at 12:03 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Mar 23, 2012 at 12:29 PM, Greg Stark <stark@mit.edu> wrote: >> On Fri, Mar 23, 2012 at 3:49 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> The complication, opportunities for bugs, and general slowdown >>> associated with that would outweigh any possible gain, in the opinion >>> of most hackers who have thought about this. >> >> I wouldn't be quite so pessimistic. I think the problem is that the >> hard part in doing this for real is all the parts the proposal glosses >> over. How much memory is it worth dedicating to the cache before the >> cost of that memory costs more than it helps? How do you invalidate >> cache entries efficiently enough that it doesn't become a bottleneck? > > I think the question of how you would invalidate things is a very good one. > > The other thing that makes me skeptical of this proposal is that I am > not very sure that executing absolutely identical queries is a very > common use case for a relational database. I suppose there might be a > few queries that run over and over again (e.g. whatever you need to > render your home page), but I think those will be the exception, and > not the rule. It therefore seems likely that the overhead of such a > cache would in most cases be greater than the benefit of having it in > the first place. > > What I think is more common is the repeated submission of queries that > are *nearly* identical, but with either different parameter bindings > or different constants. It would be nice to have some kind of cache > that would allow us to avoid the overhead of parsing and planning > nearly identical statements over and over again, but the trick is that > you have to fingerprint the query to notice that's happening in the > first place, and the fingerprinting has to cost less than what the > cache saves you. I don't know whether that's possible, but I suspect > it's far from easy. Query cache basically addresses two use cases: 1) read only or mostly read only workloads 2) badly written application code (either by human or machine) The problem is that #1 can be optimized by any number of simple techniques, and #2 is not a good basis for complicated internal features with nasty trade-offs. mysql's query cache woes are well known -- it's typical for administrators to turn the feature off. The feature is misnamed -- it's a 'benchmark cheating feature' since a lot of db benchmarks tend to focus on single user loads and/or highly repetitive queries but completely falls over in production real world workloads. Also, it's really not that difficult to rig an ad-hoc cache in the server or on the client side and you can then gear it towards your particular use-case. People that are asking for this probably really want materialized views instead. merlin
pgsql-hackers by date: