On Tue, 16 Feb 2010 15:22:00 +0100, Greg Stark <gsstark@mit.edu> wrote:
> There's a second problem though. We don't actually know how long any
> given query is going to take to plan or execute. We could just
> remember how long it took to plan and execute last time or how long it
> took to plan last time and the average execution time since we cached
> that plan. Perhaps we should track the stddev of the execution plan,
> or the max execution time of the plan? Ie there are still unanswered
> questions about the precise heuristic to use but I bet we can come up
> with something reasonable.
This could be an occasion to implement plan caching...
Web 2.0 = AJAX means less need for heavy webpage reloads with (usually)
lots of queries, and more small simple queries like selects returning 1 or
a few rows every time the user clicks on something.
See benchmark here : (PG 8.4.2, MYSQL 5.1.37)
http://purity.bobfuck.net/posts/postgres/2010-02-Prep/
If prepared statements are used, MySQL is not faster for "small, simple
selects"...
However, when not using prepared statements, most of the postmaster CPU
time is spent parsing & planning.
Problem with prepared statements is they're a chore to use in web apps,
especially PHP, since after grabbing a connection from the pool, you don't
know if it has prepared plans in it or not.
The postmaster could keep a hash of already prepared plans, using the
$-parameterized query as a hash key, and when it receives parse+bind
message, look up in this cache and fetch plans for the query, avoiding
planning entirely.
This could be done by the connection pooler too, but it doesn't have the
information to decide wether it's wise to cache a plan or not.
Of course all the subtility is to determine if the plan is reusable with
other parameters...
- after planning and executing the query, only cache it if the plan time
is a significant part of the query time (as said previously).
- only simple queries should be automatically cached like this
- perhaps some measure of "plan volatility" ? For the examples I give in
the link above, it's quite easy at least in 2 of the cases : searching
UNIQUE columns can't return more than 1 row, so volatility is zero. It
only depends on the table size.