Thread: generalizing the planner knobs

generalizing the planner knobs

From
Neil Conway
Date:
There are currently some rather crude knobs for persuading the planner
to favour certain kinds of query plans: the enable_XXX GUC variables.
Several people have asked for a more flexible way to give hints to the
planner. I'm not interested in implementing fully-general planner hints
at the moment, but ISTM that a simple improvement to what we have now
would allow for a wider range of planner hints with only minor changes:
we could replace the enable_XXX variables with a set of variables that
would add an arbitrary constant to the estimated cost of each type of
query node. (Alternatively, an arbitrary multiplier could be specified;
I'm not sure which would be better.)

This would also be useful when diagnosing bad query plans: for example,
setting enable_seqscan=false often causes the planner to disregard the
use of *any* sequential scan, anywhere in the plan. The ability to
slightly bump up the cost of particular operations would allow more
alternative plans to be examined.

On the other hand, the whole mechanism is still a hack. It also means
that applications using this will be more dependent on the actual
costing values produced by the planner, which is not good. However, if
you're in the sort of desperate straights where this sort of hackery is
required, perhaps that's acceptable.

Comments?

-Neil




Re: generalizing the planner knobs

From
"Jonah H. Harris"
Date:
Hey Neil,

In the last couple weeks I too have been thinking about planner hints.  Assuming I have read your post correctly, the issue I see with this idea is that, in most cases, there won't be much of a difference between adding an arbitrary cost value to each type of node and disabling it completely.  Also, by fiddling with an arbitrary cost the user may introduce a lot of variation into the planner which may actually result in worse query plans.

While Tom's done a great job with the planner, there are certain cases where a user knows exactly what type of join or index they want to use for a query.  In that case I'd favor run-time hints from the user similar to Oracle.  I've read about seven papers on query optimization and planning in the last few weeks and have a lot of ideas... I'm just not sure when I may get time to work on them :(

-Jonah

On 12/1/05, Neil Conway <neilc@samurai.com> wrote:
There are currently some rather crude knobs for persuading the planner
to favour certain kinds of query plans: the enable_XXX GUC variables.
Several people have asked for a more flexible way to give hints to the
planner. I'm not interested in implementing fully-general planner hints
at the moment, but ISTM that a simple improvement to what we have now
would allow for a wider range of planner hints with only minor changes:
we could replace the enable_XXX variables with a set of variables that
would add an arbitrary constant to the estimated cost of each type of
query node. (Alternatively, an arbitrary multiplier could be specified;
I'm not sure which would be better.)

This would also be useful when diagnosing bad query plans: for example,
setting enable_seqscan=false often causes the planner to disregard the
use of *any* sequential scan, anywhere in the plan. The ability to
slightly bump up the cost of particular operations would allow more
alternative plans to be examined.

On the other hand, the whole mechanism is still a hack. It also means
that applications using this will be more dependent on the actual
costing values produced by the planner, which is not good. However, if
you're in the sort of desperate straights where this sort of hackery is
required, perhaps that's acceptable.

Comments?

-Neil



---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org

Re: generalizing the planner knobs

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> ... ISTM that a simple improvement to what we have now
> would allow for a wider range of planner hints with only minor changes:
> we could replace the enable_XXX variables with a set of variables that
> would add an arbitrary constant to the estimated cost of each type of
> query node. (Alternatively, an arbitrary multiplier could be specified;
> I'm not sure which would be better.)

I think the multiplier would be better, because it'd avoid the problem
you mention later that useful values would be dependent on the planner's
cost units.  Also, one could sanely allow a multiplier less than one,
so as to favor instead of penalize a particular plan type.
        regards, tom lane


Re: generalizing the planner knobs

From
"Qingqing Zhou"
Date:
"Neil Conway" <neilc@samurai.com> wrote
>
> This would also be useful when diagnosing bad query plans: for example,
> setting enable_seqscan=false often causes the planner to disregard the
> use of *any* sequential scan, anywhere in the plan. The ability to
> slightly bump up the cost of particular operations would allow more
> alternative plans to be examined.
>

This method also has the problem of "enable_seqscan=false" in some 
situations. I would vote we implement the final general solution like query 
plan hints directly.

Regards,
Qingqing 




Re: generalizing the planner knobs

From
Tom Lane
Date:
"Jonah H. Harris" <jonah.harris@gmail.com> writes:
> In the last couple weeks I too have been thinking about planner hints.
> Assuming I have read your post correctly, the issue I see with this idea is
> that, in most cases, there won't be much of a difference between adding an
> arbitrary cost value to each type of node and disabling it completely.
> Also, by fiddling with an arbitrary cost the user may introduce a lot of
> variation into the planner which may actually result in worse query plans.

Which is pretty much exactly the problem with "planner hints", too.
I've resisted that suggestion in the past and will continue to do so,
because hints are accidents waiting to happen.  Even if the hint is right
today for your current Postgres version and current data distribution,
it's likely not to be right further down the road --- but once the hint
is embedded in your application, how often are you going to revisit it?
As an example, a hint forcing the planner to use an indexscan with a
particular index might have been a great idea in PG 8.0 and a lousy idea
in 8.1, because it would prevent substitution of a possibly-far-better
bitmap indexscan.

The enable_foo switches are debug aids, not something you are expected
to fool with for production purposes, and the same would be true of
Neil's suggested multipliers.  While I don't feel any strong need for
variable multipliers, they'd be a small enough incremental amount of
work that the suggestion doesn't require a lot of supporting argument.
Adding a planner hint facility would be several orders of magnitude
more work, and it would be taking the system in a design direction that
I think is fundamentally misguided.
        regards, tom lane


Re: generalizing the planner knobs

From
"Jonah H. Harris"
Date:
Tom,

Don't get me wrong, I agree with you completely.  I would rather put effort into enhancing the planner than in developing work-arounds.  In 99% of all cases the planner works correctly, but I know people who actually have to disable planning options (mergejoin) in production applications because they get bad plans.  The "bad" plans are not really bad in terms of what the planner knows about the query, just in areas where the planner doesn't look at other things.

I also agree that a significant amount of work would be required to add run-time hints which would be better spent enhancing the system as a whole.  My only suggestion was that it would be better than Part 1 of Neil's statement.  Somehow I missed the end mention of multipliers which I agree requires less effort.



On 12/1/05, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Jonah H. Harris" <jonah.harris@gmail.com> writes:
> In the last couple weeks I too have been thinking about planner hints.
> Assuming I have read your post correctly, the issue I see with this idea is
> that, in most cases, there won't be much of a difference between adding an
> arbitrary cost value to each type of node and disabling it completely.
> Also, by fiddling with an arbitrary cost the user may introduce a lot of
> variation into the planner which may actually result in worse query plans.

Which is pretty much exactly the problem with "planner hints", too.
I've resisted that suggestion in the past and will continue to do so,
because hints are accidents waiting to happen.  Even if the hint is right
today for your current Postgres version and current data distribution,
it's likely not to be right further down the road --- but once the hint
is embedded in your application, how often are you going to revisit it?
As an example, a hint forcing the planner to use an indexscan with a
particular index might have been a great idea in PG 8.0 and a lousy idea
in 8.1, because it would prevent substitution of a possibly-far-better
bitmap indexscan.

The enable_foo switches are debug aids, not something you are expected
to fool with for production purposes, and the same would be true of
Neil's suggested multipliers.  While I don't feel any strong need for
variable multipliers, they'd be a small enough incremental amount of
work that the suggestion doesn't require a lot of supporting argument.
Adding a planner hint facility would be several orders of magnitude
more work, and it would be taking the system in a design direction that
I think is fundamentally misguided.

                        regards, tom lane

Re: generalizing the planner knobs

From
Greg Stark
Date:
"Jonah H. Harris" <jonah.harris@gmail.com> writes:

> Tom,
> 
> Don't get me wrong, I agree with you completely.  I would rather put effort
> into enhancing the planner than in developing work-arounds.  In 99% of all
> cases the planner works correctly, but I know people who actually have to
> disable planning options (mergejoin) in production applications because they
> get bad plans.  The "bad" plans are not really bad in terms of what the
> planner knows about the query, just in areas where the planner doesn't look
> at other things.

I would like to draw a distinction between two sorts of hints. Currently
you're talking about one sort of hint, namely hints that tell the planner to
alter its cost model and choose a different plan than the inputs it has would
dictate. Using these require the user to have a fairly in depth understanding
of the planner and what options it has available.

On the other hand the type I would prefer to see are hints that feed directly
into filling in information the planner lacks. This only requires that the
user understand his own data and still lets the planner pick the best plan
based on the provided information.

So for example I would love to see a hint that allowed you to specify the
selectivity of a where clause. And one that let you specify the "density" of a
grouping clause.

Most of the time the planner makes a mistake it's because of a bad
miscalculation in estimating these givens. If it had the correct values for
the inputs then it would make the right decision about the plan.

Making the planner very good at making the right decisions given accurate
inputs is an attainable goal. Computers are pretty deterministic and it's
possible to come up with very accurate cost models. Despite some known
problems with Postgres's current models they're remarkably good already. And
there's no particular reason to think they can't be made nearly perfect.

Making the planner very good at producing accurate estimates is a much harder
goal. No matter how accurate it gets there will always be more complex
expressions that are harder to predict and there will always be cases the
planner can't estimate well. The user however knows his own data and may well
know the answer.

In the extreme consider user-defined operators, which will always be dependent
on the user to provide estimator functions. If it's a rarely used operator the
user may find it easier to simply tell the planner the selectivity of each
expression rather than come up with a general solution.

(I also think things like joins and group by clauses will never be able to be
guaranteed accurate results in general. Not without a lot more costs up front
including giving up on calculating statistics based on only a sample.)

-- 
greg



Re: generalizing the planner knobs

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> On the other hand the type I would prefer to see are hints that feed directly
> into filling in information the planner lacks. This only requires that the
> user understand his own data and still lets the planner pick the best plan
> based on the provided information.

This would avoid some issues, but it still is vulnerable to the problem
that the hint you put in your code today will fail to track changes in
your data tomorrow.
        regards, tom lane


Re: generalizing the planner knobs

From
"Pollard, Mike"
Date:
Greg Stark <gsstark@mit.edu> writes:
> On the other hand the type I would prefer to see are hints that feed
directly
> into filling in information the planner lacks. This only requires that
the
> user understand his own data and still lets the planner pick the best
plan
> based on the provided information.

Optimizer hints were added because some databases just don't have a very
smart optimizer.  But you are much better served tracking down cases in
which the optimizer makes a bad choice, and teaching the optimizer how
to make a better one.  That way, all users get the benefit of the fix.
Remember, the purpose of SQL is to isolate the end user from having to
care about how the data is retrieved; that is the RDBMS' problem.  (the
other thing forgotten was that it was supposed to be a natural language.
NVL.  Bah.)

Mike Pollard
SUPRA Server SQL Engineering and Support
Cincom Systems, Inc.




Re: generalizing the planner knobs

From
Gregory Maxwell
Date:
On 12/1/05, Pollard, Mike <mpollard@cincom.com> wrote:
> Optimizer hints were added because some databases just don't have a very
> smart optimizer.  But you are much better served tracking down cases in
> which the optimizer makes a bad choice, and teaching the optimizer how
> to make a better one.  That way, all users get the benefit of the fix.
> Remember, the purpose of SQL is to isolate the end user from having to
> care about how the data is retrieved; that is the RDBMS' problem.  (the
> other thing forgotten was that it was supposed to be a natural language.
> NVL.  Bah.)

The flipside there is that a good set of hinting options  may increase
the amount of detailed feedback we get from users on improvements
needed in the optimizer.  The current knobs are pretty blunt and don't
do as much as I'd like when trying to track down exactly where the
optimiser has gone wrong.

If we'd really like to avoid people using the knobs to rig queries,
how about making them only  work with explain analyze, useful for
debugging but not so useful for actual queries.


Re: generalizing the planner knobs

From
"Pollard, Mike"
Date:
Gregory Maxwell <gmaxwell@gmail.com> wrote:
> The flipside there is that a good set of hinting options  may increase
> the amount of detailed feedback we get from users on improvements
> needed in the optimizer.  The current knobs are pretty blunt and don't
> do as much as I'd like when trying to track down exactly where the
> optimiser has gone wrong.

Point conceded.  Any information that can help diagnose an issue is good
information.  I like the idea of only allowing it on explain.

Mike Pollard
SUPRA Server SQL Engineering and Support
Cincom Systems, Inc.




Re: generalizing the planner knobs

From
Neil Conway
Date:
On Thu, 2005-12-01 at 21:01 -0500, Gregory Maxwell wrote:
> If we'd really like to avoid people using the knobs to rig queries,
> how about making them only  work with explain analyze, useful for
> debugging but not so useful for actual queries.

That seems a pretty arbitrary limitation. I agree that it's not ideal to
have users adjust planner behavior via this means, but until we have
something better, I think applying that limitation would only make the
status quo worse.

-Neil




Re: generalizing the planner knobs

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> On Thu, 2005-12-01 at 21:01 -0500, Gregory Maxwell wrote:
>> If we'd really like to avoid people using the knobs to rig queries,
>> how about making them only  work with explain analyze, useful for
>> debugging but not so useful for actual queries.

> That seems a pretty arbitrary limitation. I agree that it's not ideal to
> have users adjust planner behavior via this means, but until we have
> something better, I think applying that limitation would only make the
> status quo worse.

Yeah, I agree.  Adding code to prevent people from using a facility
doesn't seem very reasonable, even if it's our policy that using the
facility for production purposes is not a good idea.  In fact, we just
today had a counterexample --- see this thread:
http://archives.postgresql.org/pgsql-performance/2005-12/msg00015.php
Being able to use enable_nestloop got Markus out of a short-term bind,
which to me is exactly what you want to be able to do with this sort
of thing.

I don't have any problem with expending small amounts of work to make
it easier to hack the planner in small ways.  The real problem I have
with a "planner hints" facility (in the form that I think most people
who ask for it have in mind) is that it would be a *very large* amount
of work to do it reasonably well, and I think that amount of effort
would be better spent in other ways.
        regards, tom lane


Re: generalizing the planner knobs

From
Trent Shipley
Date:
On Thursday 2005-12-01 19:01, Gregory Maxwell wrote:
> On 12/1/05, Pollard, Mike <mpollard@cincom.com> wrote:
> > Optimizer hints were added because some databases just don't have a very
> > smart optimizer.  But you are much better served tracking down cases in
> > which the optimizer makes a bad choice, and teaching the optimizer how
> > to make a better one.  That way, all users get the benefit of the fix.
> > Remember, the purpose of SQL is to isolate the end user from having to
> > care about how the data is retrieved; that is the RDBMS' problem.  (the
> > other thing forgotten was that it was supposed to be a natural language.
> > NVL.  Bah.)
>
> The flipside there is that a good set of hinting options  may increase
> the amount of detailed feedback we get from users on improvements
> needed in the optimizer.  The current knobs are pretty blunt and don't
> do as much as I'd like when trying to track down exactly where the
> optimiser has gone wrong.
>
> If we'd really like to avoid people using the knobs to rig queries,
> how about making them only  work with explain analyze, useful for
> debugging but not so useful for actual queries.

I'm all in favor of sticking to the declarative language ideal.

Also, I'm much in favor of protecting people from themselves.


On the other hand, if folks insist on engaging in extreme sports (like second 
guessing the optimizer) I'm against regulating their freedom.  I think 
exposing planner variables would be a good thing, on net.  Naturally, you 
would warn everyone not to touch them.  (Safety and freedom are both 
necessary.)

If you can play with the knobs, you should let them be used to return real 
result sets.  That way, when you get feedback, you will be able to tell if 
the cost estimator is "broken".  Just returning a modified plan won't 
challenge costing assumptions.


Re: generalizing the planner knobs

From
Greg Stark
Date:
"Pollard, Mike" <mpollard@cincom.com> writes:

> Optimizer hints were added because some databases just don't have a very
> smart optimizer.  But you are much better served tracking down cases in
> which the optimizer makes a bad choice, and teaching the optimizer how
> to make a better one.  

You more or less missed my entire point.

You can always teach the optimizer to make better decisions based on good
data. Your statement is basically right when talking about tweaking the
optimizer's decisions to ignore its best judgement.

But there are many many cases where the data the optimizer has available isn't
good and for good reason. And in plenty of those cases the data the optimizer
has available *can't* be good.

In the extreme, no amount of added intelligence in the optimizer is going to
help it come up with any sane selectivity estimate for something like 
 WHERE radius_authenticate(user) = 'OK'

-- 
greg



Re: generalizing the planner knobs

From
Csaba Nagy
Date:
On Thu, 2005-12-01 at 22:01, Tom Lane wrote:
> Greg Stark <gsstark@mit.edu> writes:
> > On the other hand the type I would prefer to see are hints that feed directly
> > into filling in information the planner lacks. This only requires that the
> > user understand his own data and still lets the planner pick the best plan
> > based on the provided information.
> 
> This would avoid some issues, but it still is vulnerable to the problem
> that the hint you put in your code today will fail to track changes in
> your data tomorrow.

Tom, I have to disagree here. At least in our application, we must
provide for an acceptable worst case scenario, and sometimes a slightly
wrong estimate can lead to a plan which is very fast 99% of the time but
completely wrong in 1% of the cases. Sometimes the percentage is 50/50.
I've had this situation with some "limit" plans where the planner had
chosen a wrong index. The problem there was that the planner had
estimated that the query will have 20 rows as a result, but it had less,
and resulted in the complete scan of the index... as opposed to a much
smaller scan that would have resulted by scanning the other index, as
that one would have provided an end condition orders of magnitudes
sooner. Now the statistics will always be only an estimation, and +/- a
few can really make a big difference in some situations. In this
particular situation the index choice of the planner would have been
faster for all cases where there were really 20 rows returned, but I
forced it to always choose the other plan (by adding the proper order
by) because I can't risk a bad result in any of the cases.
In this particular case I was able to force the planner choose a
specific plan, but that might not be always possible, so I guess it
really would make sense to be able to tell the planner how selective
some conditions are. And yes, sometimes I would like to "freeze" a
specific "safe" plan for a specific query, even if it is not optimal.

So for me the "hint" mechanism is good for telling the server that I'm
not interested at all in the BEST plan but which risks getting very bad
on occasions, but in a good enough plan which is safe.

And as for the selectivity changes over time, the hints will change
along. In most of the situations when selectivity change, the SQL has to
change too, sometimes even the complete workflow. I find that if changed
hints will help in some occasions then having them would mean less
maintenance than the code rewriting that would be otherwise involved...
and I'm completely sure the server can't compensate for the change of
the dynamics of the data all the time. And it definitely can't keep up
with highly dynamic data, where the statistics change constantly in big
tables... 

Our application for example has kind of batch processing, where we
insert smaller or larger batches of data in a HUGE table (~200 millions
of rows), and then that data is immediately used for different
operations and then reports, and furthermore it is heavily updated. I
can't think of any reasonable statistics target and ANALYZE strategy
which could satisfy both small batches and large batches without running
ANALYZE permanently with high statistics targets on the key fields...
and even that would not be specific enough when "limit 20" is involved.
For queries involving this table I really would like to freeze plans, as
any misplanning has bad consequences.

Cheers,
Csaba.




Re: generalizing the planner knobs

From
Martijn van Oosterhout
Date:
On Fri, Dec 02, 2005 at 11:07:06AM +0100, Csaba Nagy wrote:
> So for me the "hint" mechanism is good for telling the server that I'm
> not interested at all in the BEST plan but which risks getting very bad
> on occasions, but in a good enough plan which is safe.

I'm wondering if long term another approach might be to have another
parameter in the planner, cost_error or selectivity_error which is an
indication of how accurate we think it is.

So for example you have an index scan might cost x but with a possible
error of 15% and the seqscan might cost y but with an error of 1%.

The "error" for nested loop would be the product of the two inputs,
whereas a merge join whould be much less sensetive to error. A sort or
hash join would react badly to large variations of input.

So in cases where there is a choice between two indexscans with one
slightly more expensive and more accurate but can result in a mergejoin
would be a better choice than a possibly highly selective index but
without accurate info that needs to be fed into a nested loop. Even
though the latter might look better, the former is the "safer" option.

I think this would solve the problem where people see sudden flip-flops
between good and bad plans. The downside is that it's yet another
parameter for the planner to get wrong.

Unfortunatly, this is the kind of thing people write thesises on and I
don't think many people have the grounding in statistics to make it all
work.

Have a ncie day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: generalizing the planner knobs

From
"Pollard, Mike"
Date:
Greg Stark gsstark@mit.edu writes:
> You more or less missed my entire point.

Only because I am still getting used to how powerful and flexible
Postgres is; but I am working on expanding my horizons.

> In the extreme, no amount of added intelligence in the optimizer is
going
> to
> help it come up with any sane selectivity estimate for something like
>
>   WHERE radius_authenticate(user) = 'OK'

yeah, I can see where something like this would be problematic.  While I
still think that in an ideal world, you want to leave all of this to the
engine, it is true that in the real world sometimes we still have to do
some of the thinking for the computer.  It's just that I've seen code
absolutely littered with optimizer hints, and that really bothers me.
But you can't not build a useful tool just because some would abuse it.

Mike Pollard
SUPRA Server SQL Engineering and Support
Cincom Systems, Inc.




Re: generalizing the planner knobs

From
Rod Taylor
Date:
> In the extreme, no amount of added intelligence in the optimizer is going to
> help it come up with any sane selectivity estimate for something like 
> 
>   WHERE radius_authenticate(user) = 'OK'

Why not?

The missing capability in this case is to be able to provide or generate
(self learning?) statistics for a function that describe a typical
result and the cost of getting that result.
-- 



Re: generalizing the planner knobs

From
Greg Stark
Date:
Rod Taylor <pg@rbt.ca> writes:

> > In the extreme, no amount of added intelligence in the optimizer is going to
> > help it come up with any sane selectivity estimate for something like 
> > 
> >   WHERE radius_authenticate(user) = 'OK'
> 
> Why not?
> 
> The missing capability in this case is to be able to provide or generate
> (self learning?) statistics for a function that describe a typical result
> and the cost of getting that result.

Ok, try "WHERE radius_authenticate(user, (select ...), ?)"

The point is that you can improve the estimates the planner gets. But you can
never make them omniscient. There will always be cases where the user knows
his data more than the planner. And those hints are still valid when a new
optimizer has new plans available.

This is different from hints that tell the planner what plan to use. Every
situation where the predicted cost is inaccurate despite accurate estimates
represents a fixable bug in the optimizer's cost model. When a new version of
the optimizer is available with a more accurate cost model or new available
plans those kinds of hints will only get in the way.

-- 
greg



Re: generalizing the planner knobs

From
Gregory Maxwell
Date:
On 02 Dec 2005 15:49:02 -0500, Greg Stark <gsstark@mit.edu> wrote:
> Rod Taylor <pg@rbt.ca> writes:
> > The missing capability in this case is to be able to provide or generate
> > (self learning?) statistics for a function that describe a typical result
> > and the cost of getting that result.
>
> Ok, try "WHERE radius_authenticate(user, (select ...), ?)"
>
> The point is that you can improve the estimates the planner gets. But you can
> never make them omniscient. There will always be cases where the user knows
> his data more than the planner. And those hints are still valid when a new
> optimizer has new plans available.

Actually...  If a statistics engine stores the entire query as well
and used that as a key I don't see why it couldn't figure this out.
I.e. in queries that look like Z operation X has historically had
selectivity Y.

The the instruction to the user is simple: 'make sure that queries
with different results look different' . This is often naturally the
case.

The challenge becomes how do you group together queries which are
mostly the same so that you get enough data, but not falsely cluster
queries with different statistics.

The simplest way check the statistics list for the most similar query
match, and use that information. If the result is similar to what is
expected, use it to update the statistics record. If the measured
selectivity is too different make a new record which will then attract
similar queries.

Sounds like a good research project for someone.


Re: generalizing the planner knobs

From
Simon Riggs
Date:
On Fri, 2005-12-02 at 11:53 +0100, Martijn van Oosterhout wrote:
> On Fri, Dec 02, 2005 at 11:07:06AM +0100, Csaba Nagy wrote:
> > So for me the "hint" mechanism is good for telling the server that I'm
> > not interested at all in the BEST plan but which risks getting very bad
> > on occasions, but in a good enough plan which is safe.
> 
> I'm wondering if long term another approach might be to have another
> parameter in the planner, cost_error or selectivity_error which is an
> indication of how accurate we think it is.
> 
> So for example you have an index scan might cost x but with a possible
> error of 15% and the seqscan might cost y but with an error of 1%.
> 
> The "error" for nested loop would be the product of the two inputs,
> whereas a merge join whould be much less sensetive to error. A sort or
> hash join would react badly to large variations of input.
> 
> So in cases where there is a choice between two indexscans with one
> slightly more expensive and more accurate but can result in a mergejoin
> would be a better choice than a possibly highly selective index but
> without accurate info that needs to be fed into a nested loop. Even
> though the latter might look better, the former is the "safer" option.
> 
> I think this would solve the problem where people see sudden flip-flops
> between good and bad plans. The downside is that it's yet another
> parameter for the planner to get wrong.

Measuring parameters more accurately is a lengthy experimental job, not
a theoretical one. I think we are just waiting for someone to do this.

> Unfortunatly, this is the kind of thing people write thesises on and I
> don't think many people have the grounding in statistics to make it all
> work.

I'd considered that before; its just a lot of work.

The theory of error propagation is straightforward: you just take the
root mean square of the errors on the parameters. 

Trouble is, many of the planning parameters are just guesses, so you
have no idea of the error estimates either. Hence you can't really
calculate the error propagation accurately enough to make a sensible
stab at risk control. But it would be useful sometimes, which is about
the best it gets with the planner.

Right now the worst part of the planner is:
- the estimation of number of distinct values, which is an inherent
statistical limitation
- need for multi-column interaction statistics

The two are somewhat related.

Best Regards, Simon Riggs



Re: generalizing the planner knobs

From
Trent Shipley
Date:
Is it possible to submit a hand written or arbitrary execution plan to the 
retrieval engine?  (That is, can one bypass the SQL parser and planner or 
optimizer and just provide instructions to nested loop join table a to table 
b ...)


Re: generalizing the planner knobs

From
Hans-Juergen Schoenig
Date:
wouldn't it be more flexible to define a multiplicator or some sort of bool flag on a per object level?
oracle hints are a total overkill and i agree with tom that usually people will abuse this feature.
if we had a per object flag the actual planner hint can be decoupled from the actual query (i don't think putting a hint inside a query is the most clever thing).
changing a flag would be as simple as running UPDATE on some system table.
this should not be too intrusive as well.

best regards,

hans


On Dec 1, 2005, at 7:45 PM, Jonah H. Harris wrote:

Tom,

Don't get me wrong, I agree with you completely.  I would rather put effort into enhancing the planner than in developing work-arounds.  In 99% of all cases the planner works correctly, but I know people who actually have to disable planning options (mergejoin) in production applications because they get bad plans.  The "bad" plans are not really bad in terms of what the planner knows about the query, just in areas where the planner doesn't look at other things.

I also agree that a significant amount of work would be required to add run-time hints which would be better spent enhancing the system as a whole.  My only suggestion was that it would be better than Part 1 of Neil's statement.  Somehow I missed the end mention of multipliers which I agree requires less effort.



On 12/1/05, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Jonah H. Harris" <jonah.harris@gmail.com> writes:
> In the last couple weeks I too have been thinking about planner hints.
> Assuming I have read your post correctly, the issue I see with this idea is
> that, in most cases, there won't be much of a difference between adding an
> arbitrary cost value to each type of node and disabling it completely.
> Also, by fiddling with an arbitrary cost the user may introduce a lot of
> variation into the planner which may actually result in worse query plans.

Which is pretty much exactly the problem with "planner hints", too.
I've resisted that suggestion in the past and will continue to do so,
because hints are accidents waiting to happen.  Even if the hint is right
today for your current Postgres version and current data distribution,
it's likely not to be right further down the road --- but once the hint
is embedded in your application, how often are you going to revisit it?
As an example, a hint forcing the planner to use an indexscan with a
particular index might have been a great idea in PG 8.0 and a lousy idea
in 8.1, because it would prevent substitution of a possibly-far-better
bitmap indexscan.

The enable_foo switches are debug aids, not something you are expected
to fool with for production purposes, and the same would be true of
Neil's suggested multipliers.  While I don't feel any strong need for
variable multipliers, they'd be a small enough incremental amount of
work that the suggestion doesn't require a lot of supporting argument.
Adding a planner hint facility would be several orders of magnitude
more work, and it would be taking the system in a design direction that
I think is fundamentally misguided.

                        regards, tom lane


Re: generalizing the planner knobs

From
Simon Riggs
Date:
On Fri, 2005-12-02 at 11:07 +0100, Csaba Nagy wrote:
> On Thu, 2005-12-01 at 22:01, Tom Lane wrote:
> > Greg Stark <gsstark@mit.edu> writes:
> > > On the other hand the type I would prefer to see are hints that feed directly
> > > into filling in information the planner lacks. This only requires that the
> > > user understand his own data and still lets the planner pick the best plan
> > > based on the provided information.
> > 
> > This would avoid some issues, but it still is vulnerable to the problem
> > that the hint you put in your code today will fail to track changes in
> > your data tomorrow.
> 
> Tom, I have to disagree here. At least in our application, we must
> provide for an acceptable worst case scenario, and sometimes a slightly
> wrong estimate can lead to a plan which is very fast 99% of the time but
> completely wrong in 1% of the cases. Sometimes the percentage is 50/50.
> I've had this situation with some "limit" plans where the planner had
> chosen a wrong index. The problem there was that the planner had
> estimated that the query will have 20 rows as a result, but it had less,
> and resulted in the complete scan of the index... as opposed to a much
> smaller scan that would have resulted by scanning the other index, as
> that one would have provided an end condition orders of magnitudes
> sooner. Now the statistics will always be only an estimation, and +/- a
> few can really make a big difference in some situations. In this
> particular situation the index choice of the planner would have been
> faster for all cases where there were really 20 rows returned, but I
> forced it to always choose the other plan (by adding the proper order
> by) because I can't risk a bad result in any of the cases.
> In this particular case I was able to force the planner choose a
> specific plan, but that might not be always possible, so I guess it
> really would make sense to be able to tell the planner how selective
> some conditions are. And yes, sometimes I would like to "freeze" a
> specific "safe" plan for a specific query, even if it is not optimal.

Csaba raises a good point here. Many people say they want "hints" when
what they actually require the plan to be both stable and predictable.

Tom is right to point out that data can change over time. However,
experience with packaged application tuning is that you actually do want
to have things work in a stable way, even if that is somewhat
sub-optimal because when you have 1000s of statements it is important
that it doesn't change after you tune it - otherwise you never finish.
So I would like to give that requirement a name "Plan Stability"; the
actual solution to that could be many things.

Another aspect to this is predictability. At the moment, we optimise
according to the first parameter a prepared statement is bound with.
Many data distributions contain a small number of values that represent
a large fraction of the total. This can mean that it is pretty random
whether we will get a SeqScan and be stuck with it, or get an IndexScan
and be stuck with it. Either plan being wrong 50% of the time. In these
cases, hinting is definitely a very bad thing, since whichever you hint,
you'll be wrong. This situation gives us two more requirements:
- predictability - because we want to know the worst case
- flexibility - because we want to be able to take advantage of the best
case, but without causing an unconstrained worst case

So IMHO, the requirements list for prepared statement planning is that
optimization must be:
- Flexible
- Predictable
- Stable

The actual solutions to all of those things could be many and varied.

ISTM we could do some of that with another GUC, lets call it
prepare_once = on. The system default is to have a prepared statement
bound to a plan on its first parameter bind. If we set this to "off",
then the statement will replan each time we bind. This would give us
both flexibility and predictability. (As ever, someone suggest a better
name?).


The requirements for very large statement tuning are fairly different
from that. e.g. large reports or Data Warehousing queries. In those
cases, Tom's warning about the data changing can be critical and we
don't have the same need to prepare queries. Neil's original point about
needing to avoid SeqScans on some tables but not others hasn't really
been addressed. That got turned into hinting on particular nodes/node
types, but there are still issues: How do you identify one node in a
complex plan?

Perhaps the way forward would be to make enable_* accept a list of
tables, or a * (like listen_addresses). (And make "on" a synonym for "*"
and off a synonym for an empty list). That way, you'd be able to control
on a per-table basis what plan types are possible. (Still wouldn't work
that well for complex plans where same table referenced multiple times,
but its a step in the right direction.) I'm not sure I like that
particular idea much, but I'm attempting to address Neil's original
point, with which I agree.

Best Regards, Simon Riggs



Re: generalizing the planner knobs

From
Alvaro Herrera
Date:
Simon Riggs wrote:

> ISTM we could do some of that with another GUC, lets call it
> prepare_once = on. The system default is to have a prepared statement
> bound to a plan on its first parameter bind. If we set this to "off",
> then the statement will replan each time we bind. This would give us
> both flexibility and predictability. (As ever, someone suggest a better
> name?).

Why would all statements behave the same?  I think an important
percentage of cases would require a fixed plan (thus planning at first
sight is a good idea), while a limited number of cases would require
planning every time the sentence is called.  Your idea of qualifying it
by table name does not make too much sense to me, because you can have
both types of queries for each table, and further any query where this
is necessary will involve more than one table anyway, so which one do
you choose to make the decision?

So we would provide a protocol/libpq option to allow first-params-
planning (the default and current behavior), and another to allow
planning-every-time.  The latter would tell the server to save only the
parsetree of the query and replan each time it is invoked.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: generalizing the planner knobs

From
"Pollard, Mike"
Date:
Simon Riggs wrote
> ISTM we could do some of that with another GUC, lets call it
> prepare_once = on. The system default is to have a prepared statement
> bound to a plan on its first parameter bind. If we set this to "off",
> then the statement will replan each time we bind. This would give us
> both flexibility and predictability. (As ever, someone suggest a
better
> name?).
>

We call it deferred optimization.

Do you really stop at the first parameter?  What if it couldn't possibly
affect the plan (<col> like '%M%', or <col> is not involved in an
index)?  You can continue to plan up until the first parameter that can
affect the plan.  At that point, you save off the plan, and when you get
actual values (on the execute command), continue with the planning.  You
can do the same thing with correlated subqueries

Mike Pollard
SUPRA Server SQL Engineering and Support
Cincom Systems, Inc.



Re: generalizing the planner knobs

From
Greg Stark
Date:
Simon Riggs <simon@2ndquadrant.com> writes:

> Csaba raises a good point here. Many people say they want "hints" when
> what they actually require the plan to be both stable and predictable.

Plan stability is also an important feature, especially for OLTP systems which
have hard real-time requirements. OLTP systems typically don't care about
getting the "best" plan for a query, only a plan that is "good enough".

"Good enough" means it can keep up with the rate of incoming requests; it
doesn't matter whether it keeps up with 10% headroom or 20% headroom. But if
one incoming query even one in a thousand takes 1000% of the time available
then the entire system risks falling down.

But plan stability is something that should be integrated directly in the
server. Not something achieved by having the user hint every query to defeat
the optimizer.

What I'm working on for my purposes here is a perl script that takes all the
queries in the application (either gathered from the log or stored statically)
and runs ANALYZE on all of them. Then within a transaction it runs ANALYZE on
the database and re-ANALYZES every query again. If any plans change then it
mails them to the DBA and rolls back the transaction with the database
analysis. The DBA gets a chance to approve the new plans before they go into
effect.

That's more or less what I expect an integrated plan stability feature to do.
It's like a shared query plan cache except that instead of being a cache it's
a database of plans that are specifically approved by the DBA. Queries that
don't have an approved plan could be configured to either produce a warning or
an error until the plan is approved.

-- 
greg



Re: generalizing the planner knobs

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> Plan stability is also an important feature, especially for OLTP
> systems which have hard real-time requirements. OLTP systems typically
> don't care about getting the "best" plan for a query, only a plan that
> is "good enough".

> "Good enough" means it can keep up with the rate of incoming requests; it
> doesn't matter whether it keeps up with 10% headroom or 20% headroom. But if
> one incoming query even one in a thousand takes 1000% of the time available
> then the entire system risks falling down.

Is it worth pointing out that using the same plan all the time is *no*
recipe for guaranteeing response time?  There is no such thing as a plan
that is good for every case --- outlying data values can make a
usually-good plan blow out your performance guarantee anyway.  Disabling
the planner is just a recipe for ensuring that that will happen, IMHO.
        regards, tom lane


Re: generalizing the planner knobs

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Is it worth pointing out that using the same plan all the time is *no*
> recipe for guaranteeing response time?  There is no such thing as a plan
> that is good for every case --- outlying data values can make a
> usually-good plan blow out your performance guarantee anyway.  Disabling
> the planner is just a recipe for ensuring that that will happen, IMHO.

But outlying data is something the user has control over. The user when
approving plans needs to be aware not just that the plan is experimentally
good, but that it will perform reliably within the constraints based on his
knowledge of the application and the data.

My point is that I don't need a plan that is "good" for every case. I need a
plan I can trust to perform as expected. If my boss asks me what impact
doubling the number of users will have I need to be able to answer "it'll be
at worst twice as slow" (knowing that my queries and the plans I've seen are
all O(users)).

If twice as slow is still tolerable then that's fine, even if a faster plan
was possible. What I don't want to say is "well we'll have to try it and see"
which is all I can say if there's a risk the plans will change.

Actually I would expect the facility to only be useful if there was still a
way to update the plans. The DBA would periodically re-analyze the queries in
the system and check any changed plans to ensure they were still reasonable.
Effectively the same as my script except at the query level rather than at the
database statistics level.

The scenario where this is useful is not in a development environment where
things are changing dynamically. But rather in a mature application where the
data distribution is well established. Outlying data almost certainly
represents an application bug and should be signalled, not allowed to
spontaneously bring down the rest of the system.

As anecdotal evidence, in the last job where I worked, once we had 10 million
users and over a hundred web requests per second it would have been pretty
hard to believe any reasonable query could involve a sequential scan.
Certainly no query that the web application should be performing itself
without human intervention. If for whatever reason there was some outlying
data point where that would have been the "right" plan it would have
immediately brought down the web site.

-- 
greg




Re: generalizing the planner knobs

From
Simon Riggs
Date:
On Sun, 2005-12-04 at 12:49 -0300, Alvaro Herrera wrote:
> Simon Riggs wrote:
> 
> > ISTM we could do some of that with another GUC, lets call it
> > prepare_once = on. The system default is to have a prepared statement
> > bound to a plan on its first parameter bind. If we set this to "off",
> > then the statement will replan each time we bind. This would give us
> > both flexibility and predictability. (As ever, someone suggest a better
> > name?).
> 
> Why would all statements behave the same?  

They would be flexible and predictable, but not the same.

prepare_once = off
would reoptimize each statement, so each could have a potentially
different plan. Which, in the case I cited, is the only optimal
behaviour: sticking to any one plan, by any method, would be wrong.

The plans would be predictable because performance never exceeds the
worst case SeqScan; planning would be flexible because it will always
take the best plan.

> I think an important
> percentage of cases would require a fixed plan (thus planning at first
> sight is a good idea), while a limited number of cases would require
> planning every time the sentence is called.  

Yes, that is exactly what I see. Hence a GUC with a default the same as
it is now: they would only be prepared once. You would only set the GUC
to another value when you have a statement that looks like it needs
"hinting" i.e. the plan flips from SeqScan to IndexScan and back
depending upon the input data. 

> Your idea of qualifying it
> by table name does not make too much sense to me, because you can have
> both types of queries for each table, and further any query where this
> is necessary will involve more than one table anyway, so which one do
> you choose to make the decision?

That was a different idea later down my note, not a variation of the
same one: that had nothing to do with the prepare_once concept. Those
options were meant to be set on a per statement basis, not at the server
level.

I was trying to solve Neil's stated problem: How to force one part of a
query to avoid a SeqScan, yet without touching the others.

> So we would provide a protocol/libpq option to allow first-params-
> planning (the default and current behavior), and another to allow
> planning-every-time.  The latter would tell the server to save only the
> parsetree of the query and replan each time it is invoked.

Or some function similar. I prefer the GUC because it does not imply a
protocol change.

Best Regards, Simon Riggs



Re: generalizing the planner knobs

From
Simon Riggs
Date:
On Sun, 2005-12-04 at 13:47 -0500, Pollard, Mike wrote:
> Simon Riggs wrote
> >  The system default is to have a prepared statement
> > bound to a plan on its first parameter bind. 
> 
> We call it deferred optimization.
> 
> Do you really stop at the first parameter?  

The first bind of parameters to the query, yes.

> You
> can do the same thing with correlated subqueries

Not currently done, AFAIK.

Best Regards, Simon Riggs



Re: generalizing the planner knobs

From
Simon Riggs
Date:
On Mon, 2005-12-05 at 01:53 -0500, Greg Stark wrote:
> Tom Lane <tgl@sss.pgh.pa.us> writes:

> > There is no such thing as a plan
> > that is good for every case --- outlying data values can make a
> > usually-good plan blow out your performance guarantee anyway. 
> 
> But outlying data is something the user has control over. 

Unfortunately, the DBA cannot choose the data distribution in his
database. So the appearance of control is somewhat illusory.

> The user when
> approving plans needs to be aware not just that the plan is experimentally
> good, but that it will perform reliably within the constraints based on his
> knowledge of the application and the data.

Greg's idea to have a plan comparator is a good one, for most
situations.

What you'll see if you run it though is no matter what you do, there
will be a few queries that are resistant to tuning. Their stored plans
will flip from SeqScan to IndexScan and back depending upon the
parameters used; neither will be suitable all the time and either
setting will cause very variable response times.

For those queries only, I seek a solution.

["Priming" the cache by executing IndexScan causing queries does not
work for all cases, so again the appearance of control is illusory.]

My solution is to replan the queries each time, rather than just once on
first parameter bind. By some mechanism; the GUC is just one of those.

Best Regards, Simon Riggs



Re: generalizing the planner knobs

From
Hans-Juergen Schoenig
Date:
On Dec 5, 2005, at 4:17 AM, Tom Lane wrote:

> Greg Stark <gsstark@mit.edu> writes:
>> Plan stability is also an important feature, especially for OLTP
>> systems which have hard real-time requirements. OLTP systems  
>> typically
>> don't care about getting the "best" plan for a query, only a plan  
>> that
>> is "good enough".
>
>> "Good enough" means it can keep up with the rate of incoming  
>> requests; it
>> doesn't matter whether it keeps up with 10% headroom or 20%  
>> headroom. But if
>> one incoming query even one in a thousand takes 1000% of the time  
>> available
>> then the entire system risks falling down.
>
> Is it worth pointing out that using the same plan all the time is *no*
> recipe for guaranteeing response time?  There is no such thing as a  
> plan
> that is good for every case --- outlying data values can make a
> usually-good plan blow out your performance guarantee anyway.   
> Disabling
> the planner is just a recipe for ensuring that that will happen, IMHO.
>
>             regards, tom lane
>


I think I know what Greg is trying to say: I think in this plan  
stability does not mean that the plan has to be completely fixed -  
usually it is all about indexing. People start with an empty  
perfectly analyzed database and data is added. However, some day some  
cron job doing ANALYZE or whatever fails and the system will slow  
down or even break down because data is added to some table which is  
still seq-scanned. This is what usually happens and which leads to  
support cases.

Adding hints to some comments or to the statement itself is not a  
good solution as well. This is why I proposed a table or some flag  
telling the planner what to favour (= always use a certain index). So  
the basic idea is not to turn index of in general but to have the  
chance to do it on a per index basis. I guess this would not be to  
complex to implement and it solves 90% of all problems without having  
to hide some information inside comments (which is no good at all).
best regards,
    hans





Re: generalizing the planner knobs

From
Greg Stark
Date:
Hans-Juergen Schoenig <postgres@cybertec.at> writes:

> I think I know what Greg is trying to say: I think in this plan stability
> does not mean that the plan has to be completely fixed - usually it is all
> about indexing.

"Usually" problems occur because someone hasn't run analyze at all. That's not
what I'm talking about. I'm talking about a large mature system where the DBA
has everything tuned and adjusted properly and just wants to get a good
night's sleep, confident that the nightly analyze isn't going to suddenly
change the performance of existing queries.

> Adding hints to some comments or to the statement itself is not a  good
> solution as well. This is why I proposed a table or some flag  telling the
> planner what to favour (= always use a certain index). So  the basic idea is
> not to turn index of in general but to have the  chance to do it on a per index
> basis. I guess this would not be to  complex to implement and it solves 90% of
> all problems without having  to hide some information inside comments (which is
> no good at all).

I disagree that this is a reasonable solution.

I want to be sure my existing queries keep using the plans they've been using
until I allow them to change.

I don't want to sit down and type "select count(*) from users" and have it not
work correctly (ie, use a sequential scan) because the system is so single
mindedly tuned for the OLTP application.


-- 
greg



Re: generalizing the planner knobs

From
Csaba Nagy
Date:
[snip]
> I want to be sure my existing queries keep using the plans they've been using
> until I allow them to change.
> 
> I don't want to sit down and type "select count(*) from users" and have it not
> work correctly (ie, use a sequential scan) because the system is so single
> mindedly tuned for the OLTP application.
> 

Now this is exactly what I've had in mind... it would be nice to
"fixate" a plan for some of the queries, and let the planner choose the
best for all the rest. I think some other data bases have something like
an "optimizer plan stability" feature, providing "outlines" of query
plan bundles. This is maybe too much, but specifying that for a certain
query I definitely want to use one index and not the other would be
nice...

On another note, it might be interesting to have some kind of "prepare
analyze", where the planner is allowed to go and get some more detailed
estimation from the actual table data based on the hard-coded parameter
values, and produce some more detailed statistics for the parameterized
values so it can then produce hot-shot plans for the actual parameter
values on each execution... I wonder if this makes any sense. This way
we could have some very detailed statistics directly supporting the
queries we actually use. I would call this kind of prepare for the most
used/problematic queries from time to time, and the planner should
decide what statistics it needs to support it and go and get it...

Cheers,
Csaba.





Re: generalizing the planner knobs

From
Rod Taylor
Date:
On Fri, 2005-12-02 at 15:49 -0500, Greg Stark wrote:
> Rod Taylor <pg@rbt.ca> writes:
> 
> > > In the extreme, no amount of added intelligence in the optimizer is going to
> > > help it come up with any sane selectivity estimate for something like 
> > > 
> > >   WHERE radius_authenticate(user) = 'OK'
> > 
> > Why not?
> > 
> > The missing capability in this case is to be able to provide or generate
> > (self learning?) statistics for a function that describe a typical result
> > and the cost of getting that result.
> 
> Ok, try "WHERE radius_authenticate(user, (select ...), ?)"
> 
> The point is that you can improve the estimates the planner gets. But you can
> never make them omniscient. There will always be cases where the user knows
> his data more than the planner. And those hints are still valid when a new
> optimizer has new plans available.

You missed my point. If the user knows there data there is absolutely no
reason, aside from missing functionality in PostgreSQL, that statistics
cannot be generated to represent what the user knows about their data.

Once the planner knows the statistics it can make the right decision
without any hints.

The missing feature here is the ability to generate or provide
statistics and costs for functions.



-- 



Re: generalizing the planner knobs

From
"Jim C. Nasby"
Date:
On Thu, Dec 01, 2005 at 12:32:12PM -0500, Qingqing Zhou wrote:
> 
> "Neil Conway" <neilc@samurai.com> wrote
> >
> > This would also be useful when diagnosing bad query plans: for example,
> > setting enable_seqscan=false often causes the planner to disregard the
> > use of *any* sequential scan, anywhere in the plan. The ability to
> > slightly bump up the cost of particular operations would allow more
> > alternative plans to be examined.
> >
> 
> This method also has the problem of "enable_seqscan=false" in some 
> situations. I would vote we implement the final general solution like query 
> plan hints directly.

BTW, there's another end to the 'enable_seqscan=false' problem... it
sometimes doesn't work! Last I looked, enable_seqscan=false only added a
fixed overhead cost to a seqscan (1000000 IIRC). The problem is, some
queries will produce estimates for other methodes that are more
expensive than a seqscan even with the added burden. If instead of
adding a fixed amount enable_seqscan=false multiplied by some amount
then this would probably be impossible to occur.

(And before someone asks, no, I don't remember which query was actually
faster...)
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: generalizing the planner knobs

From
Jaime Casanova
Date:
>
> BTW, there's another end to the 'enable_seqscan=false' problem... it
> sometimes doesn't work! Last I looked, enable_seqscan=false only added a
> fixed overhead cost to a seqscan (1000000 IIRC). The problem is, some
> queries will produce estimates for other methodes that are more
> expensive than a seqscan even with the added burden. If instead of
> adding a fixed amount enable_seqscan=false multiplied by some amount
> then this would probably be impossible to occur.
>
> (And before someone asks, no, I don't remember which query was actually
> faster...)
> --
> Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
> Pervasive Software      http://pervasive.com    work: 512-231-6117
> vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461
>

I have often considered that this is an indication that seq scan is
actually the better plan... although, i have to admit that is a little
confusing that  enable_seqscan = false actually let you use a seqscan
if the other plans are bad enough

--
regards,
Jaime Casanova
(DBA: DataBase Aniquilator ;)


Re: generalizing the planner knobs

From
Tom Lane
Date:
Jaime Casanova <systemguards@gmail.com> writes:
>> BTW, there's another end to the 'enable_seqscan=false' problem... it
>> sometimes doesn't work!

> I have often considered that this is an indication that seq scan is
> actually the better plan...

There are cases where it is the *only* plan, eg, you have no relevant
indexes.  I am not sure that applies to Jim's complaint though.
        regards, tom lane


Re: generalizing the planner knobs

From
"Jim C. Nasby"
Date:
On Thu, Dec 08, 2005 at 01:07:10PM -0500, Tom Lane wrote:
> Jaime Casanova <systemguards@gmail.com> writes:
> >> BTW, there's another end to the 'enable_seqscan=false' problem... it
> >> sometimes doesn't work!
> 
> > I have often considered that this is an indication that seq scan is
> > actually the better plan...
> 
> There are cases where it is the *only* plan, eg, you have no relevant
> indexes.  I am not sure that applies to Jim's complaint though.

IIRC I ran into this when I was working on generating some numbers about
how well a high correlation improves the performance of an index scan
(since afaict the cost estimator for index scan is rather broken :( ) In
that case, I had defined an index on a ~120M row table on a collumn with
a very low correlation. It's pretty much a given that a seqscan and sort
would be faster than the index scan, but it would have still been good
to be able to verify that. Because of how enable_seqscan works, I
couldn't.

BTW,
http://archives.postgresql.org/pgsql-performance/2005-04/msg00669.php is
where I first mentioned this, including the cost function that I think
is broken.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461