Thread: tricking EXPLAIN?

tricking EXPLAIN?

From
Wim Bertels
Date:
Hallo,

if u compare the 2 queries, then they should be equivalent:

-- normal
-- EXPLAIN ANALYZE
SELECT     amproc, amprocnum - average AS difference
FROM     pg_amproc,
    (SELECT    avg(amprocnum) AS average
    FROM    pg_amproc) AS tmp;

-- trying to trick explain with a redundant join
-- EXPLAIN ANALYZE
SELECT     amproc, amprocnum - average AS difference
FROM     pg_amproc INNER JOIN
    (SELECT    avg(amprocnum) AS average
    FROM    pg_amproc) AS tmp
    ON pg_amproc.amproc = pg_amproc.amproc;


If we look at the output of EXPLAIN ANALYZE,
then according to the COST the second query is best one,
but according to the ACTUAL TIME the first query is best
(which seems logical intuitively).

So explain is being tricked,
and the reason for this seems the number of rows in de nested loop,
which are reduced to 1 for explain because of the join.
http://www.postgresql.org/docs/8.4/static/using-explain.html

Suggestions, comments are always welcome.

mvg,
Wim Bertels





Re: tricking EXPLAIN?

From
Szymon Guz
Date:


On 28 November 2011 12:55, Wim Bertels <wim.bertels@khleuven.be> wrote:
Hallo,

if u compare the 2 queries, then they should be equivalent:

-- normal
-- EXPLAIN ANALYZE
SELECT  amproc, amprocnum - average AS difference
FROM    pg_amproc,
       (SELECT avg(amprocnum) AS average
       FROM    pg_amproc) AS tmp;

-- trying to trick explain with a redundant join
-- EXPLAIN ANALYZE
SELECT  amproc, amprocnum - average AS difference
FROM    pg_amproc INNER JOIN
       (SELECT avg(amprocnum) AS average
       FROM    pg_amproc) AS tmp
       ON pg_amproc.amproc = pg_amproc.amproc;


If we look at the output of EXPLAIN ANALYZE,
then according to the COST the second query is best one,
but according to the ACTUAL TIME the first query is best
(which seems logical intuitively).

So explain is being tricked,
and the reason for this seems the number of rows in de nested loop,
which are reduced to 1 for explain because of the join.
http://www.postgresql.org/docs/8.4/static/using-explain.html

Suggestions, comments are always welcome.

mvg,
Wim Bertels




Hi,
could you show us the output of explain analyze? 

regards
Szymon

Re: tricking EXPLAIN?

From
Wim Bertels
Date:
On ma, 2011-11-28 at 13:00 +0100, Szymon Guz wrote:
>
>
> On 28 November 2011 12:55, Wim Bertels <wim.bertels@khleuven.be>
> wrote:
>         Hallo,
>
>         if u compare the 2 queries, then they should be equivalent:
>
>         -- normal
>         -- EXPLAIN ANALYZE
>         SELECT  amproc, amprocnum - average AS difference
>         FROM    pg_amproc,
>                (SELECT avg(amprocnum) AS average
>                FROM    pg_amproc) AS tmp;

"Nested Loop  (cost=5.04..13.13 rows=243 width=38) (actual
time=0.333..0.953 rows=243 loops=1)"
"  ->  Aggregate  (cost=5.04..5.05 rows=1 width=2) (actual
time=0.326..0.327 rows=1 loops=1)"
"        ->  Seq Scan on pg_amproc  (cost=0.00..4.43 rows=243 width=2)
(actual time=0.003..0.157 rows=243 loops=1)"
"  ->  Seq Scan on pg_amproc  (cost=0.00..4.43 rows=243 width=6) (actual
time=0.002..0.147 rows=243 loops=1)"
"Total runtime: 1.117 ms"


>
>         -- trying to trick explain with a redundant join
>         -- EXPLAIN ANALYZE
>         SELECT  amproc, amprocnum - average AS difference
>         FROM    pg_amproc INNER JOIN
>                (SELECT avg(amprocnum) AS average
>                FROM    pg_amproc) AS tmp
>                ON pg_amproc.amproc = pg_amproc.amproc;

"Nested Loop  (cost=5.04..10.11 rows=1 width=38) (actual
time=0.376..80.891 rows=243 loops=1)"
"  ->  Seq Scan on pg_amproc  (cost=0.00..5.04 rows=1 width=6) (actual
time=0.028..0.249 rows=243 loops=1)"
"        Filter: ((amproc)::oid = (amproc)::oid)"
"  ->  Aggregate  (cost=5.04..5.05 rows=1 width=2) (actual
time=0.327..0.328 rows=1 loops=243)"
"        ->  Seq Scan on pg_amproc  (cost=0.00..4.43 rows=243 width=2)
(actual time=0.002..0.156 rows=243 loops=243)"
"Total runtime: 81.101 ms"


>
>         If we look at the output of EXPLAIN ANALYZE,
>         then according to the COST the second query is best one,
>         but according to the ACTUAL TIME the first query is best
>         (which seems logical intuitively).
>
>         So explain is being tricked,
>         and the reason for this seems the number of rows in de nested
>         loop,
>         which are reduced to 1 for explain because of the join.
>         http://www.postgresql.org/docs/8.4/static/using-explain.html
>
>         Suggestions, comments are always welcome.
>
>         mvg,
>         Wim Bertels
>
>
>
>
>
> Hi,
> could you show us the output of explain analyze?


cf supra,
Wim



Re: tricking EXPLAIN?

From
Shigeru Hanada
Date:
(2011/11/28 20:55), Wim Bertels wrote:
> If we look at the output of EXPLAIN ANALYZE,
> then according to the COST the second query is best one,
> but according to the ACTUAL TIME the first query is best
> (which seems logical intuitively).
>
> So explain is being tricked,
> and the reason for this seems the number of rows in de nested loop,
> which are reduced to 1 for explain because of the join.
> http://www.postgresql.org/docs/8.4/static/using-explain.html
>
> Suggestions, comments are always welcome.

Interesting.  I tried a modified version of second query, and got same
EXPLAIN output as first query.

SELECT     amproc, amprocnum - average AS difference
FROM     pg_amproc INNER JOIN
    (SELECT    avg(amprocnum) AS average
    FROM    pg_amproc) AS tmp
    ON true;  -- semantically same as "amproc = amproc"

So, I think that the point of this issue is somehow PG thinks wrongly
that "amporc = amproc" filters the result to just one row, though such
condition never reduces result.  I also tried simplified query, and got
another result which shows that PG estimates that same condition reduces
to half.

postgres=# EXPLAIN ANALYZE SELECT * FROM pg_amproc WHERE (amproc = amproc);
                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
 Seq Scan on pg_amproc  (cost=0.00..67.52 rows=126 width=18) (actual
time=0.039..1.356 rows=252 loops=1)
   Filter: (amproc = amproc)
 Total runtime: 1.445 ms
(3 rows)

postgres=# EXPLAIN ANALYZE SELECT * FROM pg_amproc WHERE (true);
                                               QUERY PLAN
--------------------------------------------------------------------------------------------------------
 Seq Scan on pg_amproc  (cost=0.00..4.52 rows=252 width=18) (actual
time=0.008..0.045 rows=252 loops=1)
 Total runtime: 0.089 ms
(2 rows)

IMHO planner should be modified so that it can estimate result rows
accurately in this case.

--
Shigeru Hanada