Re: Cached Query Plans - Mailing list pgsql-hackers

From PFC
Subject Re: Cached Query Plans
Date
Msg-id op.t9gyfup8cigqcu@apollo13.peufeu.com
Whole thread Raw
Responses Re: Cached Query Plans  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Cached Query Plans  (Gregory Stark <stark@enterprisedb.com>)
List pgsql-hackers
> I think what he's referring to is persistently caching plans so that new
> connections can use them. That makes a lot more sense if you have lots of
> short-lived connections like a stock php server without persistent  
> connections
> turned on or a connection pooler. You can prepare queries but they only  
> live
> for a single web page so you don't get any benefit.
Let me explain a little further.
Persistent database connections are the way to go for web applications,  
because the connection is only going to get used for a few queries, and  
the time needed to start the postgres process and establish the connection  
is often significant compared to the time used for the actual queries.  
Connection pooling can also be used, you get the idea.
So, using persistent database connections, it makes sense to use prepared  
statements to speed up execution of simple queries, like those returning a  
few rows with a few joins and no complicated WHERE clauses, which is  
actually most of the queries on your average website. As shown in my  
previous message, the CPU time spent planning the query can be as much or  
even a lot more than CPU time spent actually executing the query.
But, using prepared statements with persistent connections is messy,  
because you never know if the connection is new or not, if it contains  
already prepared statements or not, you'd have to maintain a list of those  
statements (named) for every query in your application, and when someone  
changes a query, it's a mess, not to mention queries generated by the ORM  
like Rails etc.
The idea in this "proof of concept" was :
Wouldn't it be nice if Postgres could just say "Hey, I already planned  
that query, I'll reuse that plan".And it is very easy to recognize a query we've seen before, since  
$-params takes the parameters out of the equation, and eliminates parsing  
time and string quoting hell.
Storing the cached plans as prepared statements in the connection-local  
hashtable makes sense : it doesn't use that much memory anyway, and there  
are no locking and contention problems. Just like PREPARE and EXECUTE.

> Personally I would like to see this, not primarily for the performance  
> gains,
> but for the possibility of managing when plans change -- ie, plan  
> stability.
Unfortunately, this isn't compatible with a non-shared memory approach...

> But there is resistance from other quarters about the reliability hit of
> having the plan data structures in shared memory.
I agree.Hence the idea to put them in non-shared memory, local to a process.Perfectly useless when using non-persistent
connections,but very  
 
powerful when using persistent connections.

> I still don't see why you would need a wire protocol change.
Because I'd think that sometimes the client will not want to use a cached  
plan, when the query is rarely used (no need to waste memory to cache the  
plan), or it is complex and needs to be replanned according to parameter  
values every time.Sure, the client could use the oldskool "send query as text with  
parameters inside" but that's back to string escaping hell, and it's ugly.It would be nicer to have a bool
"cache_plan".

> You would just
> have clients prepare plans normally and stash them in shared memory for  
> other
> backends in a hash table keyed by, well, something, perhaps the original  
> query
> text.
Query text seems to be the simplest, better not ask the user to come up  
with distinct names when the query text will be a perfect key. Besides,  
hand-generated names might turn out not to be so distinct after all...

> Then whenever you're asked to prepare a query you go check if someone  
> else has
> already done it for you and find an already generated plan in the shared
> memory hash table.
> The contention on the shared cache is likely to negate much of the  
> planning
> savings but I think it would still be a win. But what's really  
> interesting to
> me is then providing an interface to see and manipulate that cache. Then  
> you
> could see what plans other backends are using for queries, mark plans as  
> being
> acceptable or not, and even revoke users' permissions to execute queries  
> which
> aren't already present and marked as being acceptable.
If it can be made to work with a shared cache, why not, but that would be  
more complex. You'd also have to deal with permissions, different users  
with different privileges, etc. But local would probably be simplest (and  
faster).
Also, there will be problems with the schema search path. Perhaps a query  
should be required to specify the fully qualified table names  
(schema.table) for all tables in order to be cacheable.





pgsql-hackers by date:

Previous
From: Kenneth Marshall
Date:
Subject: Re: Commit fest queue
Next
From: Tom Lane
Date:
Subject: Re: Cached Query Plans