(2011/11/28 20:55), Wim Bertels wrote:
> If we look at the output of EXPLAIN ANALYZE,
> then according to the COST the second query is best one,
> but according to the ACTUAL TIME the first query is best
> (which seems logical intuitively).
>
> So explain is being tricked,
> and the reason for this seems the number of rows in de nested loop,
> which are reduced to 1 for explain because of the join.
> http://www.postgresql.org/docs/8.4/static/using-explain.html
>
> Suggestions, comments are always welcome.
Interesting. I tried a modified version of second query, and got same
EXPLAIN output as first query.
SELECT amproc, amprocnum - average AS difference
FROM pg_amproc INNER JOIN
(SELECT avg(amprocnum) AS average
FROM pg_amproc) AS tmp
ON true; -- semantically same as "amproc = amproc"
So, I think that the point of this issue is somehow PG thinks wrongly
that "amporc = amproc" filters the result to just one row, though such
condition never reduces result. I also tried simplified query, and got
another result which shows that PG estimates that same condition reduces
to half.
postgres=# EXPLAIN ANALYZE SELECT * FROM pg_amproc WHERE (amproc = amproc);
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Seq Scan on pg_amproc (cost=0.00..67.52 rows=126 width=18) (actual
time=0.039..1.356 rows=252 loops=1)
Filter: (amproc = amproc)
Total runtime: 1.445 ms
(3 rows)
postgres=# EXPLAIN ANALYZE SELECT * FROM pg_amproc WHERE (true);
QUERY PLAN
--------------------------------------------------------------------------------------------------------
Seq Scan on pg_amproc (cost=0.00..4.52 rows=252 width=18) (actual
time=0.008..0.045 rows=252 loops=1)
Total runtime: 0.089 ms
(2 rows)
IMHO planner should be modified so that it can estimate result rows
accurately in this case.
--
Shigeru Hanada