Thread: why "WHERE uid NOT IN" is so slow, and EXCEPT in the same situtation is fast?
Two queries which do the same thing, first one takes ages to complete (did wait several minutes and cancelled it), while the second one took 9 seconds? Don't they do the same thing? miernik=> EXPLAIN SELECT uid FROM locks WHERE uid NOT IN (SELECT uid FROM locks INNER JOIN wys USING (uid, login)); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------ Seq Scan on locks (cost=38341.39..61365389.71 rows=48446 width=4) Filter: (NOT (subplan)) SubPlan -> Materialize (cost=38341.39..39408.47 rows=79508 width=4) -> Hash Join (cost=3997.27..37989.89 rows=79508 width=4) Hash Cond: (((wys.uid)::integer = (locks.uid)::integer) AND ((wys.login)::text = (locks.login)::text)) -> Seq Scan on wys (cost=0.00..13866.51 rows=633451 width=16) -> Hash (cost=2069.91..2069.91 rows=96891 width=16) -> Seq Scan on locks (cost=0.00..2069.91 rows=96891 width=16) (9 rows) Time: 231,634 ms miernik=> EXPLAIN SELECT uid FROM locks EXCEPT (SELECT uid FROM locks INNER JOIN wys USING (uid, login)); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------- SetOp Except (cost=59306.12..60188.11 rows=17640 width=4) -> Sort (cost=59306.12..59747.12 rows=176399 width=4) Sort Key: "*SELECT* 1".uid -> Append (cost=0.00..41823.79 rows=176399 width=4) -> Subquery Scan "*SELECT* 1" (cost=0.00..3038.82 rows=96891 width=4) -> Seq Scan on locks (cost=0.00..2069.91 rows=96891 width=4) -> Subquery Scan "*SELECT* 2" (cost=3997.27..38784.97 rows=79508 width=4) -> Hash Join (cost=3997.27..37989.89 rows=79508 width=4) Hash Cond: (((wys.uid)::integer = (locks.uid)::integer) AND ((wys.login)::text = (locks.login)::text)) -> Seq Scan on wys (cost=0.00..13866.51 rows=633451 width=16) -> Hash (cost=2069.91..2069.91 rows=96891 width=16) -> Seq Scan on locks (cost=0.00..2069.91 rows=96891 width=16) (12 rows) Time: 1479,238 ms miernik=> -- Miernik http://miernik.name/
Re: why "WHERE uid NOT IN" is so slow, and EXCEPT in the same situtation is fast?
From
Tom Lane
Date:
Miernik <public@public.miernik.name> writes: > Two queries which do the same thing, first one takes ages to complete > (did wait several minutes and cancelled it), while the second one took > 9 seconds? Don't they do the same thing? Hmm, what have you got work_mem set to? The first one would likely have been a lot faster if it had hashed the subplan; which I'd have thought would happen with only 80K rows in the subplan result, except it didn't. The queries are in fact not exactly equivalent, because EXCEPT involves some duplicate-elimination behavior that won't happen in the NOT IN formulation. So I don't apologize for your having gotten different plans. But you should have gotten a plan less awful than that one for the NOT IN. Another issue is that the NOT IN will probably not do what you expected if the subquery yields any NULLs. regards, tom lane
Re: why "WHERE uid NOT IN" is so slow, and EXCEPT in the same situtation is fast?
From
Miernik
Date:
On Wed, Jul 30, 2008 at 11:08:06PM -0400, Tom Lane wrote: > Hmm, what have you got work_mem set to? The first one would likely > have been a lot faster if it had hashed the subplan; which I'd have > thought would happen with only 80K rows in the subplan result, > except it didn't. work_mem = 1024kB The machine has 48 MB total RAM and is a Xen host. > The queries are in fact not exactly equivalent, because EXCEPT > involves some duplicate-elimination behavior that won't happen > in the NOT IN formulation. So I don't apologize for your having > gotten different plans. But if use EXCEPT ALL? > Another issue is that the NOT IN will probably not do what you > expected if the subquery yields any NULLs. In this specific query I think it is not possible for the subquery to have NULLs, because its an INNER JOIN USING (the_only_column_in_the _result, some_other_column_also). If any "uid" column of any row would have been NULL, it wouldn't appear in that INNER JOIN, no? -- Miernik http://miernik.name/
Re: why "WHERE uid NOT IN" is so slow, and EXCEPT in the same situtation is fast?
From
Tom Lane
Date:
Miernik <public@public.miernik.name> writes: > On Wed, Jul 30, 2008 at 11:08:06PM -0400, Tom Lane wrote: >> Hmm, what have you got work_mem set to? The first one would likely >> have been a lot faster if it had hashed the subplan; which I'd have >> thought would happen with only 80K rows in the subplan result, >> except it didn't. > work_mem = 1024kB Try increasing that ... I don't recall the exact per-row overhead but I'm quite sure it's more than 8 bytes. Ten times that would likely get you to a hash subquery plan. > The machine has 48 MB total RAM and is a Xen host. 48MB is really not a sane amount of memory to run a modern database in. Maybe you could make it go with sqlite or some other tiny-footprint DBMS, but Postgres isn't focused on that case. >> The queries are in fact not exactly equivalent, because EXCEPT >> involves some duplicate-elimination behavior that won't happen >> in the NOT IN formulation. So I don't apologize for your having >> gotten different plans. > But if use EXCEPT ALL? Fraid not, EXCEPT ALL has yet other rules for how it deals with duplicates. >> Another issue is that the NOT IN will probably not do what you >> expected if the subquery yields any NULLs. > In this specific query I think it is not possible for the subquery to > have NULLs, Okay, just wanted to point out a common gotcha. regards, tom lane