On Fri, Mar 23, 2012 at 5:03 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> The other thing that makes me skeptical of this proposal is that I am
> not very sure that executing absolutely identical queries is a very
> common use case for a relational database. I suppose there might be a
> few queries that run over and over again (e.g. whatever you need to
> render your home page), but I think those will be the exception, and
> not the rule. It therefore seems likely that the overhead of such a
> cache would in most cases be greater than the benefit of having it in
> the first place.
Well it's not entirely unlikely. If you step back a web application
looks like a big loop with a switch statement to go to different
pages. It keeps executing the same loop over and over again and there
are only a smallish number of web pages. Sure the bind variables
change but there will only be so many bind values and 10% of those
will get 90% of the traffic too.
But the other thing that happens is that people run multiple queries
aggregating or selecting from the same subset of data. So you often
get things like
select count(*) from (<complex subquery>)
select * from (<complex subquery>) order by foo limit 10
select * from (<complex subquery>) order by bar limit 10
for the same <complex subquery>. That means if we could cache the rows
coming out of parts of the plan and remember those rows when we see a
plan with a common subtree in the plan then we could avoid a lot of
repetitive work.
This depends on being able to recognize when we can guarantee that
subtrees of plans produce the same rows even if the surrounding tree
changes. That will be true sometimes but not other times.
--
greg