Re: Cached Query Plans (was: global prepared statements) - Mailing list pgsql-hackers

From Csaba Nagy
Subject Re: Cached Query Plans (was: global prepared statements)
Date
Msg-id 1208187040.8259.316.camel@PCD12478
Whole thread Raw
In response to Re: Cached Query Plans (was: global prepared statements)  (PFC <lists@peufeu.com>)
List pgsql-hackers
On Mon, 2008-04-14 at 17:08 +0200, PFC wrote:
>     Those "Decision" nodes could potentially lead to lots of decisions (ahem).
>     What if you have 10 conditions in the Where, plus some joined ones ? That  
> would make lots of possibilities...

Yes, that's true, but most of them are likely not relevant for the end
result. In any real life query there are a few parameters which are
really important for what plan you should choose... the key here is that
you should spend more time on finding the possibilities for a cached
plan than you do for a one shot query. 

In principle one-shot planning should be the default and caching should
be something the user has to chose deliberately. I would really like a
special command to plan and cache a query without actually executing it,
possibly having a parameter how hard to try... for e.g. you could expend
the extra cycles to eliminate all redundancies from boolean expressions,
in lists, to get the parse tree in a canonical format - all things which
can make planning easier. All these lose in one-shot queries, but once
you cache you can really do a lot of smarts which were no-no before...

>     Consider several types of queries :
> 
>     - The small, quick query which returns one or a few rows : in this case,  
> planning overhead is large relative to execution time, but I would venture  
> to guess that the plans always end up being the same.

Consider a 'select a where b like $1' -> the parameter $1 will
considerably affect the query plan. A query can't go much simpler...

>     - The query that takes a while : in this case, planning overhead is nil  
> compared to execution time, better replan every time with the params.

I guess these queries are not the ones targeted by this feature. In fact
for these queries it really doesn't matter if you cache or not, except:
if you know you're gonna cache, you'll expend more effort planning
right, and that could still matter for a query which runs long. Note
that if you don't cache, planning harder will lose in the long run, only
once you cache you can afford to plan harder...

>     - The complex query that still executes fast because it doesn't process a  
> lot of rows and postgres finds a good plan (for instance, a well optimized  
> search query). Those would benefit from reducing the planning overhead,  
> but those also typically end up having many different plans depending on  
> the search parameters. Besides, those queries are likely to be dynamically  
> generated. So, would it be worth it to add all those features just to  
> optimize those ? I don't know...

We have here dynamically generated queries which are specifically
chunked to be executed in small increments so none of the queries runs
too long (they would block vacuuming vital tables otherwise). Those
chunks would greatly benefit from properly planned and cached plans... 

A real smart system would store canonical plan fragments as response to
(also canonicalized) parse tree fragments, and then assemble the plan
out of those fragments, but that would be indeed really complex (to
design, the resulting code might be simpler than one thinks) ;-)

Cheers,
Csaba.





pgsql-hackers by date:

Previous
From: Kris Jurka
Date:
Subject: Re: [Pljava-dev] stack depth limit exceeded - patch possible?
Next
From: Bruce Momjian
Date:
Subject: Lessons from commit fest