Re: Searching for the cause of a bad plan - Mailing list pgsql-performance

From Csaba Nagy
Subject Re: Searching for the cause of a bad plan
Date
Msg-id 1190906166.5430.76.camel@PCD12478
Whole thread Raw
In response to Re: Searching for the cause of a bad plan  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Searching for the cause of a bad plan
List pgsql-performance
On Thu, 2007-09-27 at 10:40 -0400, Tom Lane wrote:
> And yet there's another trap here: if the parameter you passed in
> chanced to be one of the very common values, a plan that was optimized
> for a small number of matches would perform terribly.
>
> We've speculated about trying to deal with these types of situations
> by switching plans on-the-fly at runtime, but that's just blue-sky
> dreaming at the moment.  In the short run, if boosting the stats target
> doesn't result in acceptable plans, there may be no real solution other
> than to avoid parameterized queries on this column.

Well, my problem was actually solved by rising the statistics target,
thanks to Simon for suggesting it. The problem is that it's quite hard
to tell (for a non-postgres-developer) which column needs higher
statistics target when a multi-join query doesn't work as expected...

Apropos switching plans on the fly and blue sky dreaming... IIRC, there
were some plans to cache plans in shared mode for the whole cluster, not
just per backend.

What about allowing the user to prepare a plan offline, i.e. without
actually executing it (via some variant of PREPARE CACHED or so), and
let the planner do more exhaustive cost estimation, possibly actually
analyzing specific tables for correlations etc., on the ground that the
whole thing is done only once and reused many times. The resulting plan
could also contain turning points for parameter values, which would
switch between different variants of the plan, this way it can be more
specific with parameter values even if planned generically... and it
could set up some dependencies on the relevant statistics on which it is
basing it's decisions, so it will be invalidated when those statistics
are presumably changed more than a threshold, and possibly a "background
planner" thread re-plans it, after the necessary analyze steps are run
again.

If there is a "background planner", that one could also collect "long
running query" statistics and automatically do a cached plans for the
most offending ones, and possibly generate "missing index", "you should
cluster this table" and such warnings.

The fast planner would still be needed for interactive queries which are
not yet prepared, so new interactive queries don't pay the unpredictable
cost of "hard" planning. If those run fast enough, they will never get
prepared, they don't need to... otherwise they should be passed to the
background planner to be exhaustively (or at least more thoroughly)
analyzed...

One other thing I dream of would be some way to tell postgres that a
query should run in "batch mode" or "interactive mode", i.e. it should
be optimized for best throughput or fast startup, in the second case
great care should be taken to avoid the worst case scenarios too. I know
there's a strong feeling against query hints around here, but this one
could fly using a GUC parameter, which could be set in the config file
for a default value (batch for a data warehouse, interactive for an OLTP
application), and it also could be set per session.

Ok, that's about the dreaming...

Cheers,
Csaba.



pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Searching for the cause of a bad plan
Next
From: Simon Riggs
Date:
Subject: Re: Searching for the cause of a bad plan