Thread: Inaccurate Explain Cost

Inaccurate Explain Cost

From
Robert Sosinski
Date:
Hey Everyone,

I seem to be getting an inaccurate cost from explain.  Here are two examples for one query with two different query plans:

exchange_prod=# set enable_nestloop = on;
SET
exchange_prod=#
exchange_prod=# explain analyze SELECT COUNT(DISTINCT "exchange_uploads"."id") FROM "exchange_uploads" INNER JOIN "upload_destinations" ON "upload_destinations"."id" = "exchange_uploads"."upload_destination_id" LEFT OUTER JOIN "uploads" ON "uploads"."id" = "exchange_uploads"."upload_id" LEFT OUTER JOIN "import_errors" ON "import_errors"."exchange_upload_id" = "exchange_uploads"."id" LEFT OUTER JOIN "exchanges" ON "exchanges"."id" = "upload_destinations"."exchange_id" WHERE (("exchange_uploads"."created_at" >= '2012-07-27 21:21:57.363944' AND "upload_destinations"."office_id" = 6));
                                                                                            QUERY PLAN                                                                                             
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=190169.54..190169.55 rows=1 width=4) (actual time=199806.806..199806.807 rows=1 loops=1)
   ->  Nested Loop Left Join  (cost=0.00..190162.49 rows=2817 width=4) (actual time=163.293..199753.548 rows=43904 loops=1)
         ->  Nested Loop  (cost=0.00..151986.53 rows=2817 width=4) (actual time=163.275..186869.844 rows=43904 loops=1)
               ->  Index Scan using upload_destinations_office_id_idx on upload_destinations  (cost=0.00..29.95 rows=4 width=8) (actual time=0.060..0.093 rows=6 loops=1)
                     Index Cond: (office_id = 6)
               ->  Index Scan using index_exchange_uploads_on_upload_destination_id on exchange_uploads  (cost=0.00..37978.21 rows=875 width=12) (actual time=27.197..31140.375 rows=7317 loops=6)
                     Index Cond: (upload_destination_id = upload_destinations.id)
                     Filter: (created_at >= '2012-07-27 21:21:57.363944'::timestamp without time zone)
         ->  Index Scan using index_import_errors_on_exchange_upload_id on import_errors  (cost=0.00..8.49 rows=405 width=4) (actual time=0.291..0.291 rows=0 loops=43904)
               Index Cond: (exchange_upload_id = exchange_uploads.id)
 Total runtime: 199806.951 ms
(11 rows)

exchange_prod=# 
exchange_prod=# set enable_nestloop = off;
SET
exchange_prod=# 
exchange_prod=# explain analyze SELECT COUNT(DISTINCT "exchange_uploads"."id") FROM "exchange_uploads" INNER JOIN "upload_destinations" ON "upload_destinations"."id" = "exchange_uploads"."upload_destination_id" LEFT OUTER JOIN "uploads" ON "uploads"."id" = "exchange_uploads"."upload_id" LEFT OUTER JOIN "import_errors" ON "import_errors"."exchange_upload_id" = "exchange_uploads"."id" LEFT OUTER JOIN "exchanges" ON "exchanges"."id" = "upload_destinations"."exchange_id" WHERE (("exchange_uploads"."created_at" >= '2012-07-27 21:21:57.363944' AND "upload_destinations"."office_id" = 6));
                                                                                                QUERY PLAN                                                                                                 
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=2535992.33..2535992.34 rows=1 width=4) (actual time=133447.507..133447.507 rows=1 loops=1)
   ->  Hash Right Join  (cost=1816553.69..2535985.56 rows=2708 width=4) (actual time=133405.326..133417.078 rows=43906 loops=1)
         Hash Cond: (import_errors.exchange_upload_id = exchange_uploads.id)
         ->  Seq Scan on import_errors  (cost=0.00..710802.71 rows=2300471 width=4) (actual time=0.006..19199.569 rows=2321888 loops=1)
         ->  Hash  (cost=1816519.84..1816519.84 rows=2708 width=4) (actual time=112938.606..112938.606 rows=43906 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 1544kB
               ->  Hash Join  (cost=28.25..1816519.84 rows=2708 width=4) (actual time=42.957..112892.689 rows=43906 loops=1)
                     Hash Cond: (exchange_uploads.upload_destination_id = upload_destinations.id)
                     ->  Index Scan using index_upload_destinations_on_created_at on exchange_uploads  (cost=0.00..1804094.96 rows=3298545 width=12) (actual time=17.686..111649.272 rows=3303488 loops=1)
                           Index Cond: (created_at >= '2012-07-27 21:21:57.363944'::timestamp without time zone)
                     ->  Hash  (cost=28.20..28.20 rows=4 width=8) (actual time=0.043..0.043 rows=6 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 1kB
                           ->  Bitmap Heap Scan on upload_destinations  (cost=6.28..28.20 rows=4 width=8) (actual time=0.026..0.036 rows=6 loops=1)
                                 Recheck Cond: (office_id = 6)
                                 ->  Bitmap Index Scan on upload_destinations_office_id_idx  (cost=0.00..6.28 rows=4 width=0) (actual time=0.020..0.020 rows=6 loops=1)
                                       Index Cond: (office_id = 6)
 Total runtime: 133447.790 ms
(17 rows)



The first query shows a cost of 190,169.55 and runs in 199,806.951 ms.  When I disable nested loop, I get a cost of 2,535,992.34 which runs in only 133,447.790 ms.  We have run queries on our database with a cost of 200K cost before and they ran less then a few seconds, which makes me wonder if the first query plan is inaccurate.  The other issue is understanding why a query plan with a much higher cost is taking less time to run.

I do not think these queries are cached differently, as we have gotten the same results ran a couple of times at across a few days.  We also analyzed the tables that we are querying before trying the explain analyze again, and were met with the same statistics.  Any help on how Postgres comes up with a query plan like this, and why there is a difference would be very helpful.

Thanks!

-- 
Robert Sosinski

Re: Inaccurate Explain Cost

From
Shaun Thomas
Date:
On 09/26/2012 01:38 PM, Robert Sosinski wrote:

> I seem to be getting an inaccurate cost from explain.  Here are two
> examples for one query with two different query plans:

Well, there's this:

Nested Loop  (cost=0.00..151986.53 rows=2817 width=4) (actual
time=163.275..186869.844 rows=43904 loops=1)

If anything's a smoking gun, that is. I could see why you'd want to turn
off nested loops to get better execution time. But the question is: why
did it think it would match so few rows in the first place? The planner
probably would have thrown away this query plan had it known it would
loop 20x more than it thought.

I think we need to know what your default_statistics_target is set at,
and really... all of your relevant postgresql settings.

Please see this:

http://wiki.postgresql.org/wiki/Slow_Query_Questions

But you also may need to look a lot more into your query itself. The
difference between a 2 or a 3 minute query isn't going to  help you
much. Over here, we tend to spend more of our time turning 2 or 3 minute
queries into 20 or 30ms queries. But judging by your date range, getting
the last 2-months of data from a table that large generally won't be
fast by any means.

That said, looking at your actual query:

SELECT COUNT(DISTINCT eu.id)
   FROM exchange_uploads eu
   JOIN upload_destinations ud ON ud.id = eu.upload_destination_id
   LEFT JOIN uploads u ON u.id = eu.upload_id
   LEFT JOIN import_errors ie ON ie.exchange_upload_id = eu.id
   LEFT JOIN exchanges e ON e.id = ud.exchange_id
  WHERE eu.created_at >= '2012-07-27 21:21:57.363944'
    AND ud.office_id = 6;

Doesn't need half of these joins. They're left joins, and never used in
the query results or where criteria. You could just use this:

SELECT COUNT(DISTINCT eu.id)
   FROM exchange_uploads eu
   JOIN upload_destinations ud ON (ud.id = eu.upload_destination_id)
  WHERE eu.created_at >= '2012-07-27 21:21:57.363944'
    AND ud.office_id = 6;

Though I presume this is just a count precursor to a query that fetches
the actul results and does need the left join. Either way, the index
scan from your second example matches 3.3M rows by using the created_at
index on exchange_uploads. That's not really very restrictive, and so
you have two problems:

1. Your nested loop stats from office_id are somehow wrong. Try
increasing your stats on that column, or just default_statistics_target
in general, and re-analyze.
2. Your created_at criteria above match way too many rows, and will also
take a long time to process.

Those are your two actual problems. We can probably get your query to
run faster, but those are pretty significant hurdles.


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


Re: Inaccurate Explain Cost

From
hubert depesz lubaczewski
Date:
On Wed, Sep 26, 2012 at 02:38:09PM -0400, Robert Sosinski wrote:
> The first query shows a cost of 190,169.55 and runs in 199,806.951 ms.
> When I disable nested loop, I get a cost of 2,535,992.34 which runs in
> only 133,447.790 ms.  We have run queries on our database with a cost
> of 200K cost before and they ran less then a few seconds, which makes
> me wonder if the first query plan is inaccurate.  The other issue is
> understanding why a query plan with a much higher cost is taking less
> time to run.

Are you under impression that cost should be somehow related to actual
time?
If yes - that's not true, and afaik never was.
the fact that you got similar time and cost is just a coincidence.

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
                                                             http://depesz.com/


Re: Inaccurate Explain Cost

From
Samuel Gendler
Date:


On Wed, Sep 26, 2012 at 1:21 PM, hubert depesz lubaczewski <depesz@depesz.com> wrote:
On Wed, Sep 26, 2012 at 02:38:09PM -0400, Robert Sosinski wrote:
> The first query shows a cost of 190,169.55 and runs in 199,806.951 ms.
> When I disable nested loop, I get a cost of 2,535,992.34 which runs in
> only 133,447.790 ms.  We have run queries on our database with a cost
> of 200K cost before and they ran less then a few seconds, which makes
> me wonder if the first query plan is inaccurate.  The other issue is
> understanding why a query plan with a much higher cost is taking less
> time to run.

Are you under impression that cost should be somehow related to actual
time?
If yes - that's not true, and afaik never was.
the fact that you got similar time and cost is just a coincidence.

Well...only sort of.  In a well-tuned db with accurate statistics, relative cost between 2 plans should be reflected in relative execution time between those 2 queries (assuming the data in memory is similar for both runs, anyway), and that's what he seems to be complaining about.  The plan with higher cost had lower execution time, which resulted in the planner picking the slower query.  But the reason for the execution time discrepancy would appear to be, at least in part, inaccurate statistics resulting in an incorrect estimate of number of rows in a loop iteration.  More info about the db config would help to identify other things contributing to the inaccurate cost estimate - as mentioned earlier, please refer to http://wiki.postgresql.org/wiki/Slow_Query_Questions when asking performance questions

And yes, I know you know all of this, Hubert.  I wrote it for the benefit of the original questioner.

--sam

Re: [GENERAL] Inaccurate Explain Cost

From
Jeff Janes
Date:
On Wed, Sep 26, 2012 at 1:21 PM, hubert depesz lubaczewski
<depesz@depesz.com> wrote:
> On Wed, Sep 26, 2012 at 02:38:09PM -0400, Robert Sosinski wrote:
>> The first query shows a cost of 190,169.55 and runs in 199,806.951 ms.
>> When I disable nested loop, I get a cost of 2,535,992.34 which runs in
>> only 133,447.790 ms.  We have run queries on our database with a cost
>> of 200K cost before and they ran less then a few seconds, which makes
>> me wonder if the first query plan is inaccurate.  The other issue is
>> understanding why a query plan with a much higher cost is taking less
>> time to run.
>
> Are you under impression that cost should be somehow related to actual
> time?

I am certainly under that impression.  If the estimated cost has
nothing to do with run time, then what is it that the cost-based
optimizer is trying to optimize?

The arbitrary numbers of the cost parameters do not formally have any
units, but they had better have some vaguely proportional relationship
with the dimension of time, or else there is no point in having an
optimizer.  For any given piece of hardware (including table-space, if
you have different table-spaces on different storage), configuration
and cachedness, there should be some constant factor to translate cost
into time.  To the extent that there fails to be such a constant
factor, it is either a misconfiguration, or a room for improvement in
the planner.

The only exceptions I can think of is are 1) when there is only one
way to do something, the planner may not bother to cost it (i.e.
assign it a cost of zero) because it will not help make a decision.
However, the only instances of this that I know of are in DML, not in
pure selects, and 2) the costs of setting hint bits and such in
selects is not estimated, except to the extent they are folded into
something else, like the page visiting costs.

Cheers,

Jeff


Re: [GENERAL] Inaccurate Explain Cost

From
Matthias
Date:
Am 27.09.2012, 02:04 Uhr, schrieb Jeff Janes <jeff.janes@gmail.com>:

> On Wed, Sep 26, 2012 at 1:21 PM, hubert depesz lubaczewski
> <depesz@depesz.com> wrote:
>> On Wed, Sep 26, 2012 at 02:38:09PM -0400, Robert Sosinski wrote:
>>> The first query shows a cost of 190,169.55 and runs in 199,806.951 ms.
>>> When I disable nested loop, I get a cost of 2,535,992.34 which runs in
>>> only 133,447.790 ms.  We have run queries on our database with a cost
>>> of 200K cost before and they ran less then a few seconds, which makes
>>> me wonder if the first query plan is inaccurate.  The other issue is
>>> understanding why a query plan with a much higher cost is taking less
>>> time to run.
>>
>> Are you under impression that cost should be somehow related to actual
>> time?
>
> I am certainly under that impression.  If the estimated cost has
> nothing to do with run time, then what is it that the cost-based
> optimizer is trying to optimize?

See http://www.postgresql.org/docs/9.2/static/runtime-config-query.html
section "18.7.2. Planner Cost Constants".

-Matthias