Hi,
I've just noticed that the planner in 8.3.3 doesn't seem to realize the
difference in the result of the following:
SELECT col, COUNT(*)
FROM tbl
GROUP BY col;
and
SELECT col IS NULL, COUNT(*)
FROM tbl
GROUP BY col IS NULL;
For a table with several million distinct values in "col" this
makes quite a difference. I'd expect to be getting in memory hash
aggregations, but I'm getting a sort step in there instead. Here's an
example:
SELECT col1 IS NOT NULL, col2 IS NOT NULL, col3 IS NOT NULL,
COUNT(*)
FROM tbl
GROUP BY 1,2,3
ORDER BY 1,2,3;
gives the following plan:
GroupAggregate (cost=5018623.99..5387423.18 rows=4338999 width=12)
-> Sort (cost=5018623.99..5081536.33 rows=25164936 width=12)
Sort Key: ((col1 IS NOT NULL)), ((col2 IS NOT NULL)), ((col3 IS NOT NULL))
-> Seq Scan on tbl (cost=0.00..376989.36 rows=25164936 width=12)
I can't see any way for it to produce more than 8 rows of output and so
I'd expect a hash aggregate to be best. Removing the IS NOT NULLs from
the expression gives basically the same plan and expected number of rows
which then looks reasonable.
Sam