Re: Searching for the cause of a bad plan - Mailing list pgsql-performance
From | Csaba Nagy |
---|---|
Subject | Re: Searching for the cause of a bad plan |
Date | |
Msg-id | 1190906166.5430.76.camel@PCD12478 Whole thread Raw |
In response to | Re: Searching for the cause of a bad plan (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Searching for the cause of a bad plan
|
List | pgsql-performance |
On Thu, 2007-09-27 at 10:40 -0400, Tom Lane wrote: > And yet there's another trap here: if the parameter you passed in > chanced to be one of the very common values, a plan that was optimized > for a small number of matches would perform terribly. > > We've speculated about trying to deal with these types of situations > by switching plans on-the-fly at runtime, but that's just blue-sky > dreaming at the moment. In the short run, if boosting the stats target > doesn't result in acceptable plans, there may be no real solution other > than to avoid parameterized queries on this column. Well, my problem was actually solved by rising the statistics target, thanks to Simon for suggesting it. The problem is that it's quite hard to tell (for a non-postgres-developer) which column needs higher statistics target when a multi-join query doesn't work as expected... Apropos switching plans on the fly and blue sky dreaming... IIRC, there were some plans to cache plans in shared mode for the whole cluster, not just per backend. What about allowing the user to prepare a plan offline, i.e. without actually executing it (via some variant of PREPARE CACHED or so), and let the planner do more exhaustive cost estimation, possibly actually analyzing specific tables for correlations etc., on the ground that the whole thing is done only once and reused many times. The resulting plan could also contain turning points for parameter values, which would switch between different variants of the plan, this way it can be more specific with parameter values even if planned generically... and it could set up some dependencies on the relevant statistics on which it is basing it's decisions, so it will be invalidated when those statistics are presumably changed more than a threshold, and possibly a "background planner" thread re-plans it, after the necessary analyze steps are run again. If there is a "background planner", that one could also collect "long running query" statistics and automatically do a cached plans for the most offending ones, and possibly generate "missing index", "you should cluster this table" and such warnings. The fast planner would still be needed for interactive queries which are not yet prepared, so new interactive queries don't pay the unpredictable cost of "hard" planning. If those run fast enough, they will never get prepared, they don't need to... otherwise they should be passed to the background planner to be exhaustively (or at least more thoroughly) analyzed... One other thing I dream of would be some way to tell postgres that a query should run in "batch mode" or "interactive mode", i.e. it should be optimized for best throughput or fast startup, in the second case great care should be taken to avoid the worst case scenarios too. I know there's a strong feeling against query hints around here, but this one could fly using a GUC parameter, which could be set in the config file for a default value (batch for a data warehouse, interactive for an OLTP application), and it also could be set per session. Ok, that's about the dreaming... Cheers, Csaba.
pgsql-performance by date: