Re: tricking EXPLAIN? - Mailing list pgsql-general

From Shigeru Hanada
Subject Re: tricking EXPLAIN?
Date
Msg-id 4ED5B1C0.5080705@gmail.com
Whole thread Raw
In response to tricking EXPLAIN?  (Wim Bertels <wim.bertels@khleuven.be>)
List pgsql-general
(2011/11/28 20:55), Wim Bertels wrote:
> If we look at the output of EXPLAIN ANALYZE,
> then according to the COST the second query is best one,
> but according to the ACTUAL TIME the first query is best
> (which seems logical intuitively).
>
> So explain is being tricked,
> and the reason for this seems the number of rows in de nested loop,
> which are reduced to 1 for explain because of the join.
> http://www.postgresql.org/docs/8.4/static/using-explain.html
>
> Suggestions, comments are always welcome.

Interesting.  I tried a modified version of second query, and got same
EXPLAIN output as first query.

SELECT     amproc, amprocnum - average AS difference
FROM     pg_amproc INNER JOIN
    (SELECT    avg(amprocnum) AS average
    FROM    pg_amproc) AS tmp
    ON true;  -- semantically same as "amproc = amproc"

So, I think that the point of this issue is somehow PG thinks wrongly
that "amporc = amproc" filters the result to just one row, though such
condition never reduces result.  I also tried simplified query, and got
another result which shows that PG estimates that same condition reduces
to half.

postgres=# EXPLAIN ANALYZE SELECT * FROM pg_amproc WHERE (amproc = amproc);
                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
 Seq Scan on pg_amproc  (cost=0.00..67.52 rows=126 width=18) (actual
time=0.039..1.356 rows=252 loops=1)
   Filter: (amproc = amproc)
 Total runtime: 1.445 ms
(3 rows)

postgres=# EXPLAIN ANALYZE SELECT * FROM pg_amproc WHERE (true);
                                               QUERY PLAN
--------------------------------------------------------------------------------------------------------
 Seq Scan on pg_amproc  (cost=0.00..4.52 rows=252 width=18) (actual
time=0.008..0.045 rows=252 loops=1)
 Total runtime: 0.089 ms
(2 rows)

IMHO planner should be modified so that it can estimate result rows
accurately in this case.

--
Shigeru Hanada

pgsql-general by date:

Previous
From: panam
Date:
Subject: Re: Extending the volume size of the data directory volume
Next
From: Heiko Wundram
Date:
Subject: Re: Limiting number of connections to PostgreSQL per IP (not per DB/user)?