I'm sure there is a good reason why NOT IN will not use an Anti-Join
plan equivalent to NOT EXISTS due to NULL handling, but in this
particular case the value being compared is in the PRIMARY KEY of both
structures being joined.
The NOT IN plan was killed after 10 minutes. The NOT EXISTS plan
returned data in roughly 10ms.
Is there a reason why the NOT IN plan could not use Anti-Joins when
the column being compared against is guaranteed to be NOT NULL? Too
much planner overhead to determine nullness of the column?
sk=# explain select * from source_reb_listing where listing_id not in
(select listing_id from source_reb_listing_specs) order by file_id
desc limit 5;
QUERY
PLAN
---------------------------------------------------------------------------------------------------------------------------------------------Limit
(cost=729015.39..3420463.83 rows=5 width=28) -> Index Scan Backward using source_reb_listing_fileid_idx on
source_reb_listing (cost=729015.39..169537219655.96 rows=314954
width=28) Filter: (NOT (SubPlan 1)) SubPlan 1 -> Materialize (cost=729015.39..1185280.74
rows=32810035width=8) -> Seq Scan on source_reb_listing_specs
(cost=0.00..568040.35 rows=32810035 width=8)
(6 rows)
sk=# explain select * from source_reb_listing where not exists (select
* from source_reb_listing_specs as t where t.listing_id =
source_reb_listing.listing_id) order by file_id desc limit 5;
QUERY
PLAN
-----------------------------------------------------------------------------------------------------------------------------------------Limit
(cost=0.00..35.31 rows=5 width=28) -> Nested Loop Anti Join (cost=0.00..3880495.87 rows=549496 width=28) ->
IndexScan Backward using source_reb_listing_fileid_idx
on source_reb_listing (cost=0.00..1107142.20 rows=629907 width=28) -> Index Scan using
source_reb_listing_specs_pkeyon
source_reb_listing_specs t (cost=0.00..1592.74 rows=408 width=8) Index Cond: (t.listing_id =
source_reb_listing.listing_id)
(5 rows)