Re: query cache - Mailing list pgsql-hackers

From Merlin Moncure
Subject Re: query cache
Date
Msg-id CAHyXU0ySQV+UA3jcHw_ebxumiEow7ibY890CmwZjoaObAVhY8Q@mail.gmail.com
Whole thread Raw
In response to Re: query cache  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Fri, Mar 23, 2012 at 12:03 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Mar 23, 2012 at 12:29 PM, Greg Stark <stark@mit.edu> wrote:
>> On Fri, Mar 23, 2012 at 3:49 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> The complication, opportunities for bugs, and general slowdown
>>> associated with that would outweigh any possible gain, in the opinion
>>> of most hackers who have thought about this.
>>
>> I wouldn't be quite so pessimistic. I think the problem is that the
>> hard part in doing this for real is all the parts the proposal glosses
>> over. How much memory is it worth dedicating to the cache before the
>> cost of that memory costs more than it helps? How do you invalidate
>> cache entries efficiently enough that it doesn't become a bottleneck?
>
> I think the question of how you would invalidate things is a very good one.
>
> The other thing that makes me skeptical of this proposal is that I am
> not very sure that executing absolutely identical queries is a very
> common use case for a relational database.  I suppose there might be a
> few queries that run over and over again (e.g. whatever you need to
> render your home page), but I think those will be the exception, and
> not the rule.  It therefore seems likely that the overhead of such a
> cache would in most cases be greater than the benefit of having it in
> the first place.
>
> What I think is more common is the repeated submission of queries that
> are *nearly* identical, but with either different parameter bindings
> or different constants.  It would be nice to have some kind of cache
> that would allow us to avoid the overhead of parsing and planning
> nearly identical statements over and over again, but the trick is that
> you have to fingerprint the query to notice that's happening in the
> first place, and the fingerprinting has to cost less than what the
> cache saves you.  I don't know whether that's possible, but I suspect
> it's far from easy.

Query cache basically addresses two use cases:
1) read only or mostly read only workloads
2) badly written application code (either by human or machine)

The problem is that #1 can be optimized by any number of simple
techniques, and #2 is not a good basis for complicated internal
features with nasty trade-offs.  mysql's query cache woes are well
known -- it's typical for administrators to turn the feature off.  The
feature is misnamed -- it's a 'benchmark cheating feature' since a lot
of db benchmarks tend to focus on single user loads and/or highly
repetitive queries but completely falls over in production real world
workloads.  Also, it's really not that difficult to rig an ad-hoc
cache in the server or on the client side and you can then gear it
towards your particular use-case.

People that are asking for this probably really want materialized views instead.

merlin


pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: query cache
Next
From: Alvaro Herrera
Date:
Subject: Re: Finer Extension dependencies