Strangely enough doing an EXPLAIN on the two queries shows that using
EXISTS would be faster than IN ... even though it isn't ..
psql TMP -c "explain select count(distinct invoice_id) from invoice_li
where received='true' AND shipped='false' AND cancelled='false'
AND
(invoice_id not in
(
select distinct invoice_id from invoice_li where received='false'
AND cancelled='false'
)
OR ship_now='true'
) "
NOTICE: QUERY PLAN:
Aggregate (cost=17642485.70..17642485.70 rows=1 width=4)
-> Seq Scan on invoice_li (cost=0.00..17642460.40 rows=10120 width=4)
SubPlan
-> Materialize (cost=871.61..871.61 rows=1 width=4)
-> Unique (cost=871.61..871.61 rows=1 width=4)
-> Sort (cost=871.61..871.61 rows=1 width=4)
-> Seq Scan on invoice_li
(cost=0.00..871.60 rows=1 width=4)
EXPLAIN
$ psql TMP -c "explain select count(distinct invoice_id) from invoice_li
where received='true'
AND shipped='false' AND cancelled='false'
AND
(NOT EXISTS
(
select * from invoice_li AS sq_inv_li where received='false'
AND cancelled='false' AND invoice_li.invoice_id=sq_inv_li.invoice_id
)
OR ship_now='true'
) "
NOTICE: QUERY PLAN:
Aggregate (cost=4955505.10..4955505.10 rows=1 width=4)
-> Seq Scan on invoice_li (cost=0.00..4955479.80 rows=10120 width=4)
SubPlan
-> Index Scan using invoice_li_pkey on invoice_li sq_inv_li
(cost=0.00..244.79 rows=1 width=80)
EXPLAIN
Jc