Re: Cached Query Plans (was: global prepared statements) - Mailing list pgsql-hackers

From PFC
Subject Re: Cached Query Plans (was: global prepared statements)
Date
Msg-id op.t9jt5qdrcigqcu@apollo13.peufeu.com
Whole thread Raw
In response to Re: Cached Query Plans (was: global prepared statements)  ("Jonah H. Harris" <jonah.harris@gmail.com>)
Responses Re: Cached Query Plans (was: global prepared statements)  (Martijn van Oosterhout <kleptog@svana.org>)
Re: Cached Query Plans (was: global prepared statements)  ("Dawid Kuroczko" <qnex42@gmail.com>)
List pgsql-hackers
> On Fri, Apr 11, 2008 at 12:34 PM, PFC <lists@peufeu.com> wrote:
>>         Well, I realized the idea of global prepared statements actually
>> sucked, so I set on another approach thanks to ideas from this list,
>> this is
>> caching query plans.
>
> Well, that's a blatantly bad realization.  Perhaps you should do more
> research.
No, what I meant is that the "global prepared statements" as I tried to
implement them before weren't that good...I think simple caching based on the query text itself is preferable to
having to name each of your queries, extract them from your programs and
replace them by executes, issue a "create statement" command for each of
them, etc. Few people would actually use that feature because it would
mean lots of modifications to the application, so all the applications
that have to be compatible with other databases would not use the feature
(*)It could be useful for permissions and fine access control, though, but
views and stored procs already provide that functionality...
(*) = Note that caching the plans based on the query text (with $ params)  from a parse message will not provide
cachingfor oldskool queries with   
params inside in the form of escaped strings. This is good, because it
means the safer solution (using $-quoted params) will also be the faster
solution. And in the application, only a very small part of the code needs
to be changed, that's the DB abstraction layer.


>>  Doesn't Oracle do this now transparently to clients?
>
> Of course it does, and it has since the late 80's I believe.
>
>>  Oracle keeps a statement/plan cache in its shared memory segment (SGA)
>>  that greatly improves its performance at running queries that don't
>>  change very often.
Can we have more details on how Oracle does it ? For "inspiration"...
Here is what I'm thinking about :Don't flame me too much about implementation issues, this is just
throwing ideas in the air to see where they'll fall ;)

* global plan cache in shared memory, implemented as hashtable, hash key
being the (search_path, query_string)
Doubt : Can a plan be stored in shared memory ? Will it have to be copied
to local memory before being executed ?

This stores :
- the plans (not for all keys, see below)
- the stats :- number of times this query has been executed,- total, min and max wallclock time and CPU time spent
planningthis   
query,- total, min and max wallclock time, CPU time and RAM spent executing
this query,- total, min and max number of rows returned,- last timestamp of execution of this query,

There should be separate GUCs to control this :- should the whole thing be activated ?- should the cache be active ? or
justthe stats ? and what stats ? 

There should be also a way to query this to display the statistics (ie
"what query is killing my server ?"), and a way to purge old plans.

* every time a Parse message comes up :
- look if the (search_path, query_string) is in the cache
- if it is in the cache :- if there is a cached plan, make the unnamed statement point to it, and
we're done.- if there is no cached plan, prepare the query, and put it in the
unnamed statement.

Now, the query has been parsed, so we can decide if it is cacheable.
Should this be done in Parse, in Bind, or somewhere else ? I have no idea.

For instance, queries which contain VALUES() or IN( list of consts )
should not be cached, since the IN() is likely to change all the time, it
would just trash the cache. Using =ANY( $1 ) instead will work with cached
plans.

Also, will a plan to be cached have to be prepared with or without the
parameters ? That's also an interesting question...
Perhaps the user should also be able to specify wether to cache a plan or
not, or wether to use the params or not, with hint flags in the query
string ?
(like mysql, /* flags */ SELECT blah )
Now, if the query is cacheable, store it in the cache, and update the
stats. If we decided to store the plan, do that too. For instance we might
decide to store the plan only if this query has been executed a certain
number of times, etc.

* In the Execute message, if a cached plan was used, execute it and update
the stats (time spent, etc).
Now, about contention, since this is one shared hashtable for everyone,
it will be fought for...However, the lock on it is likely to be held during a very small time
(much less than a microsecond), so would it be that bad ?Also, GUC can be used to mitigate the contention, for instance
ifthe   
user is not interested in the stats, the thing becomes mostly read-only






















pgsql-hackers by date:

Previous
From: James Mansion
Date:
Subject: Re: Cached Query Plans
Next
From: Kris Jurka
Date:
Subject: Re: [Pljava-dev] stack depth limit exceeded - patch possible?