The following bug has been logged on the website:
Bug reference: 8571
Logged by: Timothy Garnett
Email address: tgarnett@panjiva.com
PostgreSQL version: 9.2.4
Operating system: Ubuntu Linux x86_64 12.04.3 LTS
Description:
The planner seems to be badly miss-estimating the selectivity of '<always
null col> IS NOT true/false', it does not seem to do so for equivalent
expressions such as '<col> is null or not <col>' or expressions where the
selectivity is difficult to determine.
Simplified scenario:
create temporary table test_tbl as (select generate_series as id,
null::boolean as val from generate_series(1,1000));
analyze verbose test_tbl;
select count(*) from test_tbl where val is not true;
=> 1000
explain select * from test_tbl where val is null; -- correctly estimates
1000 rows
=> Seq Scan on test_tbl (cost=0.00..15.00 rows=1000 width=5)
Filter: (val IS NULL)
-- problem case --
explain select * from test_tbl where val is not true; -- estimates only 1
row!
=> Seq Scan on test_tbl (cost=0.00..15.00 rows=1 width=5)
Filter: (val IS NOT TRUE)
explain select * from test_tbl where NOT coalesce(val, false); -- estimates
500 rows (actual 1,000) but will still generally result in a reasonable
plan
=> Seq Scan on test_tbl (cost=0.00..15.00 rows=500 width=5)
Filter: (NOT COALESCE(val, false))
explain select * from test_tbl where val is null or not val; -- correctly
estimates 1000 rows
=> Seq Scan on test_tbl (cost=0.00..15.00 rows=1000 width=5)
Filter: ((val IS NULL) OR (NOT val))
Estimating only one row for the selectivity of a where clause with
additional joins involved when many rows match can result in some very bad
plans which is what we ran into.