Re: Cached Query Plans (was: global prepared statements) - Mailing list pgsql-hackers

From Dawid Kuroczko
Subject Re: Cached Query Plans (was: global prepared statements)
Date
Msg-id 758d5e7f0804130831k6008eddeye0df06a3692a60f@mail.gmail.com
Whole thread Raw
In response to Re: Cached Query Plans (was: global prepared statements)  (PFC <lists@peufeu.com>)
Responses Re: Cached Query Plans (was: global prepared statements)  (PFC <lists@peufeu.com>)
List pgsql-hackers
On Sun, Apr 13, 2008 at 2:26 PM, PFC <lists@peufeu.com> wrote:
> > >  Oracle keeps a statement/plan cache in its shared memory segment (SGA)
> > >  that greatly improves its performance at running queries that don't
> > >  change very often.
>         Can we have more details on how Oracle does it ? For
> "inspiration"...

Why limit ourselves with Oracle?  How all major proprietary RDBMSs do it.

Here is  a nice presentation I've found on DB2, they call it "Dynamic
Statement Cache":

http://www.tbrug.com/TB%20UG%20Dynamic%20Statement%20Cache.ppt

>         Here is what I'm thinking about :
>         Don't flame me too much about implementation issues, this is just
> throwing ideas in the air to see where they'll fall ;)
>
>  * global plan cache in shared memory, implemented as hashtable, hash key
> being the (search_path, query_string)
>  Doubt : Can a plan be stored in shared memory ? Will it have to be copied
> to local memory before being executed ?

Well, Oracle uses terms "hard parse" and "soft parse", the former being
preparing the whole query, the latter reusing query plan prepared by
some other session.   More or less.  See this link for more detailed
description:

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:2588723819082

(this is quite interesting read)

>  This stores :
>  - the plans (not for all keys, see below)
>  - the stats :
[...]

I am not too sure that plans and statistical counters should be stored
together...
Probably plans should go in one place, and statistics should go to the
stats collector (I know he's not quite ready for this ;)).

>  There should be also a way to query this to display the statistics (ie
> "what query is killing my server ?"), and a way to purge old plans.

Hm, a limit on how much memory can be used for plans
(query_plan_cache_size GUC?), and a LRU/LFU expiration
of old plans?

>  * every time a Parse message comes up :
>  - look if the (search_path, query_string) is in the cache
>  - if it is in the cache :
>         - if there is a cached plan, make the unnamed statement point to it,
> and we're done.
>         - if there is no cached plan, prepare the query, and put it in the
> unnamed statement.
>
>  Now, the query has been parsed, so we can decide if it is cacheable. Should
> this be done in Parse, in Bind, or somewhere else ? I have no idea.
>
>  For instance, queries which contain VALUES() or IN( list of consts ) should
> not be cached, since the IN() is likely to change all the time, it would
> just trash the cache. Using =ANY( $1 ) instead will work with cached plans.

Perhaps a GUC for controlling query cache should heve three values:none -- don't cache any statementsmart -- use
heuristicsfor deciding whether to cache itall -- force caching all queries -- for uncommon/statistical/testing
purposes.

>>  Also, will a plan to be cached have to be prepared with or without the
> parameters ? That's also an interesting question...
>  Perhaps the user should also be able to specify wether to cache a plan or
> not, or wether to use the params or not, with hint flags in the query string
> ?
>  (like mysql, /* flags */ SELECT blah )

I don't like the hint flags.  They tend to haunt later on (when the database
gets smarter, but application forces it to be dumb).  I would say a GUC.
GUC gives freedom of change to the application, and can also be set
per user with ALTER USER.

>         Now, if the query is cacheable, store it in the cache, and update
> the stats. If we decided to store the plan, do that too. For instance we
> might decide to store the plan only if this query has been executed a
> certain number of times, etc.

Interesting idea.  I think I like it.

>  * In the Execute message, if a cached plan was used, execute it and update
> the stats (time spent, etc).
>
>         Now, about contention, since this is one shared hashtable for
> everyone, it will be fought for...
>         However, the lock on it is likely to be held during a very small
> time (much less than a microsecond), so would it be that bad ?
>         Also, GUC can be used to mitigate the contention, for instance if
> the user is not interested in the stats, the thing becomes mostly read-only

I would say: keep the stats separate.  For evey plan cached generate
some unique id (Perhaps OID? I am not convinced), and use this ID
as the key for the statistics.  I tend to think of it as a "temporary table,
and temporary table stats". :)
  Regards,     Dawid


pgsql-hackers by date:

Previous
From: Martijn van Oosterhout
Date:
Subject: Re: Cached Query Plans (was: global prepared statements)
Next
From: Andrew Dunstan
Date:
Subject: pgwin32_safestat weirdness