I think you've perhaps got a caching issue in those numbers.
Note, I haven't been following this thread at all so I don't know if the IN
version is the typical timing for the query. However, I would suggest that you
try reversing the order you test those to see what effect the caching of data
from the EXISTS version has on the IN timing. Or just repeat the each version
and use the second run timings.
On Thu, 19 Sep 2002, Jean-Christian Imbeault wrote:
> Henshall, Stuart - WCP wrote:
>
> [deleted]
>
> I tried your optimized query but it was muh slower. Here are the results:
>
> ##Query using EXISTS
>
> $ !1173
> time psql TMP -c "select count(distinct invoice_id) from invoice_li
> where received='true'
> AND shipped='false' AND cancelled='false'
> AND
> (NOT EXISTS
> (
> select * from invoice_li AS sq_inv_li where received='false'
> AND cancelled='false' AND invoice_li.invoice_id=sq_inv_li.invoice_id
> )
> OR ship_now='true'
> ) "
> count
> -------
> 170
> (1 row)
>
>
> real 0m8.322s
> user 0m0.010s
> sys 0m0.000s
>
> ##Query using IN
>
> $ !1175
> time psql TMP -c "select count(distinct invoice_id) from invoice_li
> where received='true'
> AND shipped='false' AND cancelled='false'
> AND
> (invoice_id not in
> (
> select distinct invoice_id from invoice_li where received='false'
> AND cancelled='false'
> )
> OR ship_now='true'
> ) "
> count
> -------
> 170
> (1 row)
>
>
> real 0m0.234s
> user 0m0.000s
> sys 0m0.010s
>
> Maybe EXISTS is not always faster than IN ?
>
> After a "vacuum analyze" the numbers become:
>
> #using EXISTS
>
> real 0m3.229s
> user 0m0.000s
> sys 0m0.000s
>
> #using IN
>
> real 0m0.141s
> user 0m0.000s
> sys 0m0.000s
>