On Fri, Mar 23, 2012 at 1:51 PM, Greg Stark <stark@mit.edu> wrote:
> Well it's not entirely unlikely. If you step back a web application
> looks like a big loop with a switch statement to go to different
> pages. It keeps executing the same loop over and over again and there
> are only a smallish number of web pages. Sure the bind variables
> change but there will only be so many bind values and 10% of those
> will get 90% of the traffic too.
That may be true, but lots of web applications have millions of users.The fact that a few hundred thousand of those may
accountfor most of
the traffic doesn't seem like it's going to help much unless there are
not many users in total; and in that case it's plenty fast enough
without a cache anyway.
> But the other thing that happens is that people run multiple queries
> aggregating or selecting from the same subset of data. So you often
> get things like
>
> select count(*) from (<complex subquery>)
> select * from (<complex subquery>) order by foo limit 10
> select * from (<complex subquery>) order by bar limit 10
>
> for the same <complex subquery>. That means if we could cache the rows
> coming out of parts of the plan and remember those rows when we see a
> plan with a common subtree in the plan then we could avoid a lot of
> repetitive work.
Currently, we don't even recognize this situation within a plan; for
example, if you do project pp LEFT JOIN person sr ON pp.sales_rep_id =
sr.id LEFT JOIN person pm ON pp.project_manager_id = pm.id, the query
planner will happily seq-scan the person table twice to build two
copies of the same hash table.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company