Hi all,
Jumping in directly to the subject, this is what I get:
explain SELECT bigint_col_1, bigint_col_2 FROM big_table WHERE
bigint_col_2 in (12132131, null, null, null,
null);
QUERY PLAN
-------------------------------------------------------------------
Seq Scan on big_table (cost=0.00..2447201.85 rows=448 width=16)
Filter: ((bigint_col_2 = 12132131::bigint) OR NULL::boolean)
(2 rows)
Compared to:
explain SELECT bigint_col_1, bigint_col_2 FROM big_table WHERE
bigint_col_2 in (12132131, 123781, 1297839032, 123667123);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using dom_idx_member_bigint_col_2,
dom_idx_member_bigint_col_2, dom_idx_member_bigint_col_2,
dom_idx_member_bigint_col_2 on big_table (cost=0.00..6427.28 rows=1789
width=16)
Index Cond: ((bigint_col_2 = 12132131) OR (bigint_col_2 = 123781) OR
(bigint_col_2 = 1297839032) OR (bigint_col_2 = 123667123))
(2 rows)
big_table has ~ 100 million rows.
Considering that NULL::boolean is always false, I don't quite understand
why the first query is going for a sequential scan, instead of just
ignoring the nulls from the in clause...
We have a lot of queries like this for chunking activities, and they
work too on oracle. We've just migrated a bigger data base to postgres
from oracle, and this is the first thing to slow down our system to a
crawl... we will fix this, but the parser could be smarter I guess.
Cheers,
Csaba.