Thread: tricking EXPLAIN?
Hallo, if u compare the 2 queries, then they should be equivalent: -- normal -- EXPLAIN ANALYZE SELECT amproc, amprocnum - average AS difference FROM pg_amproc, (SELECT avg(amprocnum) AS average FROM pg_amproc) AS tmp; -- trying to trick explain with a redundant join -- EXPLAIN ANALYZE SELECT amproc, amprocnum - average AS difference FROM pg_amproc INNER JOIN (SELECT avg(amprocnum) AS average FROM pg_amproc) AS tmp ON pg_amproc.amproc = pg_amproc.amproc; If we look at the output of EXPLAIN ANALYZE, then according to the COST the second query is best one, but according to the ACTUAL TIME the first query is best (which seems logical intuitively). So explain is being tricked, and the reason for this seems the number of rows in de nested loop, which are reduced to 1 for explain because of the join. http://www.postgresql.org/docs/8.4/static/using-explain.html Suggestions, comments are always welcome. mvg, Wim Bertels
On 28 November 2011 12:55, Wim Bertels <wim.bertels@khleuven.be> wrote:
Hallo,
if u compare the 2 queries, then they should be equivalent:
-- normal
-- EXPLAIN ANALYZE
SELECT amproc, amprocnum - average AS difference
FROM pg_amproc,
(SELECT avg(amprocnum) AS average
FROM pg_amproc) AS tmp;
-- trying to trick explain with a redundant join
-- EXPLAIN ANALYZE
SELECT amproc, amprocnum - average AS difference
FROM pg_amproc INNER JOIN
(SELECT avg(amprocnum) AS average
FROM pg_amproc) AS tmp
ON pg_amproc.amproc = pg_amproc.amproc;
If we look at the output of EXPLAIN ANALYZE,
then according to the COST the second query is best one,
but according to the ACTUAL TIME the first query is best
(which seems logical intuitively).
So explain is being tricked,
and the reason for this seems the number of rows in de nested loop,
which are reduced to 1 for explain because of the join.
http://www.postgresql.org/docs/8.4/static/using-explain.html
Suggestions, comments are always welcome.
mvg,
Wim Bertels
Hi,
could you show us the output of explain analyze?
regards
Szymon
On ma, 2011-11-28 at 13:00 +0100, Szymon Guz wrote: > > > On 28 November 2011 12:55, Wim Bertels <wim.bertels@khleuven.be> > wrote: > Hallo, > > if u compare the 2 queries, then they should be equivalent: > > -- normal > -- EXPLAIN ANALYZE > SELECT amproc, amprocnum - average AS difference > FROM pg_amproc, > (SELECT avg(amprocnum) AS average > FROM pg_amproc) AS tmp; "Nested Loop (cost=5.04..13.13 rows=243 width=38) (actual time=0.333..0.953 rows=243 loops=1)" " -> Aggregate (cost=5.04..5.05 rows=1 width=2) (actual time=0.326..0.327 rows=1 loops=1)" " -> Seq Scan on pg_amproc (cost=0.00..4.43 rows=243 width=2) (actual time=0.003..0.157 rows=243 loops=1)" " -> Seq Scan on pg_amproc (cost=0.00..4.43 rows=243 width=6) (actual time=0.002..0.147 rows=243 loops=1)" "Total runtime: 1.117 ms" > > -- trying to trick explain with a redundant join > -- EXPLAIN ANALYZE > SELECT amproc, amprocnum - average AS difference > FROM pg_amproc INNER JOIN > (SELECT avg(amprocnum) AS average > FROM pg_amproc) AS tmp > ON pg_amproc.amproc = pg_amproc.amproc; "Nested Loop (cost=5.04..10.11 rows=1 width=38) (actual time=0.376..80.891 rows=243 loops=1)" " -> Seq Scan on pg_amproc (cost=0.00..5.04 rows=1 width=6) (actual time=0.028..0.249 rows=243 loops=1)" " Filter: ((amproc)::oid = (amproc)::oid)" " -> Aggregate (cost=5.04..5.05 rows=1 width=2) (actual time=0.327..0.328 rows=1 loops=243)" " -> Seq Scan on pg_amproc (cost=0.00..4.43 rows=243 width=2) (actual time=0.002..0.156 rows=243 loops=243)" "Total runtime: 81.101 ms" > > If we look at the output of EXPLAIN ANALYZE, > then according to the COST the second query is best one, > but according to the ACTUAL TIME the first query is best > (which seems logical intuitively). > > So explain is being tricked, > and the reason for this seems the number of rows in de nested > loop, > which are reduced to 1 for explain because of the join. > http://www.postgresql.org/docs/8.4/static/using-explain.html > > Suggestions, comments are always welcome. > > mvg, > Wim Bertels > > > > > > Hi, > could you show us the output of explain analyze? cf supra, Wim
(2011/11/28 20:55), Wim Bertels wrote: > If we look at the output of EXPLAIN ANALYZE, > then according to the COST the second query is best one, > but according to the ACTUAL TIME the first query is best > (which seems logical intuitively). > > So explain is being tricked, > and the reason for this seems the number of rows in de nested loop, > which are reduced to 1 for explain because of the join. > http://www.postgresql.org/docs/8.4/static/using-explain.html > > Suggestions, comments are always welcome. Interesting. I tried a modified version of second query, and got same EXPLAIN output as first query. SELECT amproc, amprocnum - average AS difference FROM pg_amproc INNER JOIN (SELECT avg(amprocnum) AS average FROM pg_amproc) AS tmp ON true; -- semantically same as "amproc = amproc" So, I think that the point of this issue is somehow PG thinks wrongly that "amporc = amproc" filters the result to just one row, though such condition never reduces result. I also tried simplified query, and got another result which shows that PG estimates that same condition reduces to half. postgres=# EXPLAIN ANALYZE SELECT * FROM pg_amproc WHERE (amproc = amproc); QUERY PLAN --------------------------------------------------------------------------------------------------------- Seq Scan on pg_amproc (cost=0.00..67.52 rows=126 width=18) (actual time=0.039..1.356 rows=252 loops=1) Filter: (amproc = amproc) Total runtime: 1.445 ms (3 rows) postgres=# EXPLAIN ANALYZE SELECT * FROM pg_amproc WHERE (true); QUERY PLAN -------------------------------------------------------------------------------------------------------- Seq Scan on pg_amproc (cost=0.00..4.52 rows=252 width=18) (actual time=0.008..0.045 rows=252 loops=1) Total runtime: 0.089 ms (2 rows) IMHO planner should be modified so that it can estimate result rows accurately in this case. -- Shigeru Hanada