Thread: Cost estimate vs. actual - do I care?

Cost estimate vs. actual - do I care?

From
Jay Levitt
Date:
I gather that a big part of making queries performant is making sure the
planner's estimates reflect reality.

Given a random explain analyze line:

Limit  (cost=0.02..943.11 rows=50 width=16) (actual time=0.603..79.729
rows=50 loops=1)

which is the truer statement?

1. As long as costs go up with actual time, you're fine.

2. You should try to ensure that costs go up linearly with actual time.

3. You should try to ensure that costs are as close as possible to actual time.

4. The number "4".

Jay Levitt

Re: Cost estimate vs. actual - do I care?

From
Mark Mielke
Date:
3), 2), 1).

The planner needs the right information to make the right decision.
However, the planner rarely has perfect information, so the algorithms
need to be able to cope with some amount of imperfection while still
generally making the right decision. There are also a limited
(relatively) set of possible plans and the plans often enough have
different enough characteristics that the truly different plans (i.e.
factor of 10 difference in terms of run time) won't be selected by
accident, even with fairly bad estimates.

It probably depends mostly on your data set. For many data sets, the
estimates might be possible to be off by a factor of 10 and still come
up with the same "right" plan. For many other data sets, for example,
the difference between an index scan or a sequential scan could result
in performance differences by a factor of 10 or more and picking the
"right one" is incredibly important.

If you are worried, you can test your theory with "set enable_seqscan=0"
and other such things to see what alternative plans might be generated
and whether they are indeed better or not. Generally, I only worry about
it for queries that I know to be bottlenecks, and many of the times -
the time investment I make into trying to prove that a better query is
possible ends up only educating me on why I am wrong... :-)

The majority of the time for me, anyways, I don't find that the
estimates are that bad or that the planner is wrong. It's usually the
typical scenario where somebody added a query but forgot to make sure
the query was efficient by ensuring that the indexes properly accelerate
their query.

Good luck.

mark


On 01/01/2012 01:59 PM, Jay Levitt wrote:
> I gather that a big part of making queries performant is making sure
> the planner's estimates reflect reality.
>
> Given a random explain analyze line:
>
> Limit  (cost=0.02..943.11 rows=50 width=16) (actual time=0.603..79.729
> rows=50 loops=1)
>
> which is the truer statement?
>
> 1. As long as costs go up with actual time, you're fine.
>
> 2. You should try to ensure that costs go up linearly with actual time.
>
> 3. You should try to ensure that costs are as close as possible to
> actual time.
>
> 4. The number "4".
>
> Jay Levitt
>


--
Mark Mielke<mark@mielke.cc>