Thread: Inaccurate Explain Cost
Hey Everyone,
I seem to be getting an inaccurate cost from explain. Here are two examples for one query with two different query plans:
exchange_prod=# set enable_nestloop = on;
SET
exchange_prod=#
exchange_prod=# explain analyze SELECT COUNT(DISTINCT "exchange_uploads"."id") FROM "exchange_uploads" INNER JOIN "upload_destinations" ON "upload_destinations"."id" = "exchange_uploads"."upload_destination_id" LEFT OUTER JOIN "uploads" ON "uploads"."id" = "exchange_uploads"."upload_id" LEFT OUTER JOIN "import_errors" ON "import_errors"."exchange_upload_id" = "exchange_uploads"."id" LEFT OUTER JOIN "exchanges" ON "exchanges"."id" = "upload_destinations"."exchange_id" WHERE (("exchange_uploads"."created_at" >= '2012-07-27 21:21:57.363944' AND "upload_destinations"."office_id" = 6));
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=190169.54..190169.55 rows=1 width=4) (actual time=199806.806..199806.807 rows=1 loops=1)
-> Nested Loop Left Join (cost=0.00..190162.49 rows=2817 width=4) (actual time=163.293..199753.548 rows=43904 loops=1)
-> Nested Loop (cost=0.00..151986.53 rows=2817 width=4) (actual time=163.275..186869.844 rows=43904 loops=1)
-> Index Scan using upload_destinations_office_id_idx on upload_destinations (cost=0.00..29.95 rows=4 width=8) (actual time=0.060..0.093 rows=6 loops=1)
Index Cond: (office_id = 6)
-> Index Scan using index_exchange_uploads_on_upload_destination_id on exchange_uploads (cost=0.00..37978.21 rows=875 width=12) (actual time=27.197..31140.375 rows=7317 loops=6)
Index Cond: (upload_destination_id = upload_destinations.id)
Filter: (created_at >= '2012-07-27 21:21:57.363944'::timestamp without time zone)
-> Index Scan using index_import_errors_on_exchange_upload_id on import_errors (cost=0.00..8.49 rows=405 width=4) (actual time=0.291..0.291 rows=0 loops=43904)
Index Cond: (exchange_upload_id = exchange_uploads.id)
Total runtime: 199806.951 ms
(11 rows)
exchange_prod=#
exchange_prod=# set enable_nestloop = off;
SET
exchange_prod=#
exchange_prod=# explain analyze SELECT COUNT(DISTINCT "exchange_uploads"."id") FROM "exchange_uploads" INNER JOIN "upload_destinations" ON "upload_destinations"."id" = "exchange_uploads"."upload_destination_id" LEFT OUTER JOIN "uploads" ON "uploads"."id" = "exchange_uploads"."upload_id" LEFT OUTER JOIN "import_errors" ON "import_errors"."exchange_upload_id" = "exchange_uploads"."id" LEFT OUTER JOIN "exchanges" ON "exchanges"."id" = "upload_destinations"."exchange_id" WHERE (("exchange_uploads"."created_at" >= '2012-07-27 21:21:57.363944' AND "upload_destinations"."office_id" = 6));
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=2535992.33..2535992.34 rows=1 width=4) (actual time=133447.507..133447.507 rows=1 loops=1)
-> Hash Right Join (cost=1816553.69..2535985.56 rows=2708 width=4) (actual time=133405.326..133417.078 rows=43906 loops=1)
Hash Cond: (import_errors.exchange_upload_id = exchange_uploads.id)
-> Seq Scan on import_errors (cost=0.00..710802.71 rows=2300471 width=4) (actual time=0.006..19199.569 rows=2321888 loops=1)
-> Hash (cost=1816519.84..1816519.84 rows=2708 width=4) (actual time=112938.606..112938.606 rows=43906 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1544kB
-> Hash Join (cost=28.25..1816519.84 rows=2708 width=4) (actual time=42.957..112892.689 rows=43906 loops=1)
Hash Cond: (exchange_uploads.upload_destination_id = upload_destinations.id)
-> Index Scan using index_upload_destinations_on_created_at on exchange_uploads (cost=0.00..1804094.96 rows=3298545 width=12) (actual time=17.686..111649.272 rows=3303488 loops=1)
Index Cond: (created_at >= '2012-07-27 21:21:57.363944'::timestamp without time zone)
-> Hash (cost=28.20..28.20 rows=4 width=8) (actual time=0.043..0.043 rows=6 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Bitmap Heap Scan on upload_destinations (cost=6.28..28.20 rows=4 width=8) (actual time=0.026..0.036 rows=6 loops=1)
Recheck Cond: (office_id = 6)
-> Bitmap Index Scan on upload_destinations_office_id_idx (cost=0.00..6.28 rows=4 width=0) (actual time=0.020..0.020 rows=6 loops=1)
Index Cond: (office_id = 6)
Total runtime: 133447.790 ms
(17 rows)
The first query shows a cost of 190,169.55 and runs in 199,806.951 ms. When I disable nested loop, I get a cost of 2,535,992.34 which runs in only 133,447.790 ms. We have run queries on our database with a cost of 200K cost before and they ran less then a few seconds, which makes me wonder if the first query plan is inaccurate. The other issue is understanding why a query plan with a much higher cost is taking less time to run.
I do not think these queries are cached differently, as we have gotten the same results ran a couple of times at across a few days. We also analyzed the tables that we are querying before trying the explain analyze again, and were met with the same statistics. Any help on how Postgres comes up with a query plan like this, and why there is a difference would be very helpful.
Thanks!
--
Robert Sosinski
On 09/26/2012 01:38 PM, Robert Sosinski wrote: > I seem to be getting an inaccurate cost from explain. Here are two > examples for one query with two different query plans: Well, there's this: Nested Loop (cost=0.00..151986.53 rows=2817 width=4) (actual time=163.275..186869.844 rows=43904 loops=1) If anything's a smoking gun, that is. I could see why you'd want to turn off nested loops to get better execution time. But the question is: why did it think it would match so few rows in the first place? The planner probably would have thrown away this query plan had it known it would loop 20x more than it thought. I think we need to know what your default_statistics_target is set at, and really... all of your relevant postgresql settings. Please see this: http://wiki.postgresql.org/wiki/Slow_Query_Questions But you also may need to look a lot more into your query itself. The difference between a 2 or a 3 minute query isn't going to help you much. Over here, we tend to spend more of our time turning 2 or 3 minute queries into 20 or 30ms queries. But judging by your date range, getting the last 2-months of data from a table that large generally won't be fast by any means. That said, looking at your actual query: SELECT COUNT(DISTINCT eu.id) FROM exchange_uploads eu JOIN upload_destinations ud ON ud.id = eu.upload_destination_id LEFT JOIN uploads u ON u.id = eu.upload_id LEFT JOIN import_errors ie ON ie.exchange_upload_id = eu.id LEFT JOIN exchanges e ON e.id = ud.exchange_id WHERE eu.created_at >= '2012-07-27 21:21:57.363944' AND ud.office_id = 6; Doesn't need half of these joins. They're left joins, and never used in the query results or where criteria. You could just use this: SELECT COUNT(DISTINCT eu.id) FROM exchange_uploads eu JOIN upload_destinations ud ON (ud.id = eu.upload_destination_id) WHERE eu.created_at >= '2012-07-27 21:21:57.363944' AND ud.office_id = 6; Though I presume this is just a count precursor to a query that fetches the actul results and does need the left join. Either way, the index scan from your second example matches 3.3M rows by using the created_at index on exchange_uploads. That's not really very restrictive, and so you have two problems: 1. Your nested loop stats from office_id are somehow wrong. Try increasing your stats on that column, or just default_statistics_target in general, and re-analyze. 2. Your created_at criteria above match way too many rows, and will also take a long time to process. Those are your two actual problems. We can probably get your query to run faster, but those are pretty significant hurdles. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
Em 26/09/2012 17:03, Shaun Thomas escreveu: > On 09/26/2012 01:38 PM, Robert Sosinski wrote: > >> I seem to be getting an inaccurate cost from explain. Here are two >> examples for one query with two different query plans: > > Well, there's this: > > Nested Loop (cost=0.00..151986.53 rows=2817 width=4) (actual > time=163.275..186869.844 rows=43904 loops=1) > > If anything's a smoking gun, that is. I could see why you'd want to > turn off nested loops to get better execution time. But the question > is: why did it think it would match so few rows in the first place? > The planner probably would have thrown away this query plan had it > known it would loop 20x more than it thought. > > I think we need to know what your default_statistics_target is set at, > and really... all of your relevant postgresql settings. > > Please see this: > > http://wiki.postgresql.org/wiki/Slow_Query_Questions > > But you also may need to look a lot more into your query itself. The > difference between a 2 or a 3 minute query isn't going to help you > much. Over here, we tend to spend more of our time turning 2 or 3 > minute queries into 20 or 30ms queries. But judging by your date > range, getting the last 2-months of data from a table that large > generally won't be fast by any means. > > That said, looking at your actual query: > > SELECT COUNT(DISTINCT eu.id) > FROM exchange_uploads eu > JOIN upload_destinations ud ON ud.id = eu.upload_destination_id > LEFT JOIN uploads u ON u.id = eu.upload_id > LEFT JOIN import_errors ie ON ie.exchange_upload_id = eu.id > LEFT JOIN exchanges e ON e.id = ud.exchange_id > WHERE eu.created_at >= '2012-07-27 21:21:57.363944' > AND ud.office_id = 6; > > Doesn't need half of these joins. They're left joins, and never used > in the query results or where criteria. You could just use this: Interesting. I've similar situation, where user can choose a set of filters, and then the query must have several left joins "just in case" (user need in the filer). I know other database that is able to remove unnecessary outer joins from queries when they are not relevant and for instance become faster. Can't PostgreSQL do the same? Regards, Edson. > > SELECT COUNT(DISTINCT eu.id) > FROM exchange_uploads eu > JOIN upload_destinations ud ON (ud.id = eu.upload_destination_id) > WHERE eu.created_at >= '2012-07-27 21:21:57.363944' > AND ud.office_id = 6; > > Though I presume this is just a count precursor to a query that > fetches the actul results and does need the left join. Either way, the > index scan from your second example matches 3.3M rows by using the > created_at index on exchange_uploads. That's not really very > restrictive, and so you have two problems: > > 1. Your nested loop stats from office_id are somehow wrong. Try > increasing your stats on that column, or just > default_statistics_target in general, and re-analyze. > 2. Your created_at criteria above match way too many rows, and will > also take a long time to process. > > Those are your two actual problems. We can probably get your query to > run faster, but those are pretty significant hurdles. > >
On Wed, Sep 26, 2012 at 02:38:09PM -0400, Robert Sosinski wrote: > The first query shows a cost of 190,169.55 and runs in 199,806.951 ms. > When I disable nested loop, I get a cost of 2,535,992.34 which runs in > only 133,447.790 ms. We have run queries on our database with a cost > of 200K cost before and they ran less then a few seconds, which makes > me wonder if the first query plan is inaccurate. The other issue is > understanding why a query plan with a much higher cost is taking less > time to run. Are you under impression that cost should be somehow related to actual time? If yes - that's not true, and afaik never was. the fact that you got similar time and cost is just a coincidence. Best regards, depesz -- The best thing about modern society is how easy it is to avoid contact with it. http://depesz.com/
On Wed, Sep 26, 2012 at 1:21 PM, hubert depesz lubaczewski <depesz@depesz.com> wrote:
On Wed, Sep 26, 2012 at 02:38:09PM -0400, Robert Sosinski wrote:Are you under impression that cost should be somehow related to actual
> The first query shows a cost of 190,169.55 and runs in 199,806.951 ms.
> When I disable nested loop, I get a cost of 2,535,992.34 which runs in
> only 133,447.790 ms. We have run queries on our database with a cost
> of 200K cost before and they ran less then a few seconds, which makes
> me wonder if the first query plan is inaccurate. The other issue is
> understanding why a query plan with a much higher cost is taking less
> time to run.
time?
If yes - that's not true, and afaik never was.
the fact that you got similar time and cost is just a coincidence.
Well...only sort of. In a well-tuned db with accurate statistics, relative cost between 2 plans should be reflected in relative execution time between those 2 queries (assuming the data in memory is similar for both runs, anyway), and that's what he seems to be complaining about. The plan with higher cost had lower execution time, which resulted in the planner picking the slower query. But the reason for the execution time discrepancy would appear to be, at least in part, inaccurate statistics resulting in an incorrect estimate of number of rows in a loop iteration. More info about the db config would help to identify other things contributing to the inaccurate cost estimate - as mentioned earlier, please refer to http://wiki.postgresql.org/wiki/Slow_Query_Questions when asking performance questions
And yes, I know you know all of this, Hubert. I wrote it for the benefit of the original questioner.
--sam
Edson Richter <edsonrichter@hotmail.com> writes: >> That said, looking at your actual query: >> >> SELECT COUNT(DISTINCT eu.id) >> FROM exchange_uploads eu >> JOIN upload_destinations ud ON ud.id = eu.upload_destination_id >> LEFT JOIN uploads u ON u.id = eu.upload_id >> LEFT JOIN import_errors ie ON ie.exchange_upload_id = eu.id >> LEFT JOIN exchanges e ON e.id = ud.exchange_id >> WHERE eu.created_at >= '2012-07-27 21:21:57.363944' >> AND ud.office_id = 6; >> >> Doesn't need half of these joins. They're left joins, and never used >> in the query results or where criteria. You could just use this: > Interesting. I've similar situation, where user can choose a set of > filters, and then the query must have several left joins "just in case" > (user need in the filer). > I know other database that is able to remove unnecessary outer joins > from queries when they are not relevant and for instance become faster. > Can't PostgreSQL do the same? It does, and did - note the query plan is only scanning 3 of the 5 tables mentioned in the query. (The other left join appears to be to a non-unique column, which makes it not redundant.) regards, tom lane
On Wed, Sep 26, 2012 at 1:21 PM, hubert depesz lubaczewski <depesz@depesz.com> wrote: > On Wed, Sep 26, 2012 at 02:38:09PM -0400, Robert Sosinski wrote: >> The first query shows a cost of 190,169.55 and runs in 199,806.951 ms. >> When I disable nested loop, I get a cost of 2,535,992.34 which runs in >> only 133,447.790 ms. We have run queries on our database with a cost >> of 200K cost before and they ran less then a few seconds, which makes >> me wonder if the first query plan is inaccurate. The other issue is >> understanding why a query plan with a much higher cost is taking less >> time to run. > > Are you under impression that cost should be somehow related to actual > time? I am certainly under that impression. If the estimated cost has nothing to do with run time, then what is it that the cost-based optimizer is trying to optimize? The arbitrary numbers of the cost parameters do not formally have any units, but they had better have some vaguely proportional relationship with the dimension of time, or else there is no point in having an optimizer. For any given piece of hardware (including table-space, if you have different table-spaces on different storage), configuration and cachedness, there should be some constant factor to translate cost into time. To the extent that there fails to be such a constant factor, it is either a misconfiguration, or a room for improvement in the planner. The only exceptions I can think of is are 1) when there is only one way to do something, the planner may not bother to cost it (i.e. assign it a cost of zero) because it will not help make a decision. However, the only instances of this that I know of are in DML, not in pure selects, and 2) the costs of setting hint bits and such in selects is not estimated, except to the extent they are folded into something else, like the page visiting costs. Cheers, Jeff
Am 27.09.2012, 02:04 Uhr, schrieb Jeff Janes <jeff.janes@gmail.com>: > On Wed, Sep 26, 2012 at 1:21 PM, hubert depesz lubaczewski > <depesz@depesz.com> wrote: >> On Wed, Sep 26, 2012 at 02:38:09PM -0400, Robert Sosinski wrote: >>> The first query shows a cost of 190,169.55 and runs in 199,806.951 ms. >>> When I disable nested loop, I get a cost of 2,535,992.34 which runs in >>> only 133,447.790 ms. We have run queries on our database with a cost >>> of 200K cost before and they ran less then a few seconds, which makes >>> me wonder if the first query plan is inaccurate. The other issue is >>> understanding why a query plan with a much higher cost is taking less >>> time to run. >> >> Are you under impression that cost should be somehow related to actual >> time? > > I am certainly under that impression. If the estimated cost has > nothing to do with run time, then what is it that the cost-based > optimizer is trying to optimize? See http://www.postgresql.org/docs/9.2/static/runtime-config-query.html section "18.7.2. Planner Cost Constants". -Matthias