Thread: IN surpasses NOT EXISTS in 7.4RC2 ??
Hi, NOT EXISTS is taking almost double time than NOT IN . I know IN has been optimised in 7.4 but is anything wrong with the NOT EXISTS? I have vaccumed , analyze and run the query many times still not in is faster than exists :> Regds Mallah. NOT IN PLAN tradein_clients=# explain analyze SELECT count(*) from general.profile_master where profile_id not in (select profile_id from general.account_profiles ) ; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=32238.19..32238.19 rows=1 width=0) (actual time=5329.206..5329.207 rows=1 loops=1) -> Seq Scan on profile_master (cost=4458.25..31340.38 rows=359125 width=0) (actual time=1055.496..4637.908 rows=470386loops=1) Filter: (NOT (hashed subplan)) SubPlan -> Seq Scan on account_profiles (cost=0.00..3817.80 rows=256180 width=4) (actual time=0.061..507.811 rows=256180loops=1) Total runtime: 5337.591 ms (6 rows) tradein_clients=# explain analyze SELECT count(*) from general.profile_master where not exists (select profile_id from general.account_profiles where profile_id=general.profile_master.profile_id ) ; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=1674981.97..1674981.97 rows=1 width=0) (actual time=14600.386..14600.387 rows=1 loops=1) -> Seq Scan on profile_master (cost=0.00..1674084.16 rows=359125 width=0) (actual time=13.687..13815.798 rows=470386loops=1) Filter: (NOT (subplan)) SubPlan -> Index Scan using account_profiles_profile_id on account_profiles (cost=0.00..4.59 rows=2 width=4) (actualtime=0.013..0.013 rows=0 loops=718250) Index Cond: (profile_id = $0) Total runtime: 14600.531 ms
Rajesh Kumar Mallah <mallah@trade-india.com> writes: > NOT EXISTS is taking almost double time than NOT IN . > I know IN has been optimised in 7.4 but is anything > wrong with the NOT EXISTS? That's the expected behavior in 7.4. EXISTS in the style you are using it effectively forces a nestloop-with-inner-indexscan implementation. As of 7.4, IN can do that, but it can do several other things too, including the hash-type plan you have here. So assuming that the planner chooses the right plan choice (not always a given ;-)) IN should be as fast or faster than EXISTS in all cases. regards, tom lane
Rajesh Kumar Mallah wrote: > > Hi, > > NOT EXISTS is taking almost double time than NOT IN . > I know IN has been optimised in 7.4 but is anything > wrong with the NOT EXISTS? > > I have vaccumed , analyze and run the query many times > still not in is faster than exists :> Seems fine. In 7.4, NOT IN will often be faster that NOT EXISTS. NOT EXISTS didn't change --- there are restrictions on how far we can optimize NOT EXISTS. NOT IN has just become much faster in 7.4. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Robert Treat wrote:<br /><blockquote cite="mid1068737502.10946.18338.camel@camel" type="cite"><pre wrap="">It is believedthat the IN optimization can lead to faster IN times than EXIST times on some queries, the extent of which is still a bit of an unknown. (Incidentally is there an FAQ item on this that needs updating?) </pre></blockquote><br /> Thanks every one for clarifying. Its really a nice thing to see IN working<br /> sowell becoz its easier to read the SQL using IN. <br /><br /> looks like NOT IN is indifferent to indexes where is IN usesindexes , is it true?<br /><br /> does indexes affect the new manner in which IN works in 7.4 ?<br /><br /><br /><br/><br /><br /><blockquote cite="mid1068737502.10946.18338.camel@camel" type="cite"><pre wrap=""> Does the not exist query produce worse results in 7.4 than it did in 7.3?</pre></blockquote> Will surely post the overvation sometime.<br /><br /><br /><br /> Regards<br /> Mallah.<br /><br/><br /><blockquote cite="mid1068737502.10946.18338.camel@camel" type="cite"><pre wrap=""> Robert Treat On Thu, 2003-11-13 at 02:53, Rajesh Kumar Mallah wrote: </pre><blockquote type="cite"><pre wrap="">Hi, NOT EXISTS is taking almost double time than NOT IN . I know IN has been optimised in 7.4 but is anything wrong with the NOT EXISTS? I have vaccumed , analyze and run the query many times still not in is faster than exists :> Regds Mallah. NOT IN PLAN tradein_clients=# explain analyze SELECT count(*) from general.profile_master whereprofile_id not in (select profile_idfrom general.account_profiles ) ; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=32238.19..32238.19 rows=1 width=0) (actual time=5329.206..5329.207 rows=1 loops=1) -> Seq Scan on profile_master (cost=4458.25..31340.38 rows=359125 width=0) (actual time=1055.496..4637.908 rows=470386 loops=1) Filter:(NOT (hashed subplan)) SubPlan -> Seq Scan on account_profiles (cost=0.00..3817.80 rows=256180width=4) (actual time=0.061..507.811 rows=256180 loops=1) Total runtime: 5337.591 ms (6 rows) tradein_clients=# explain analyze SELECT count(*) from general.profile_master where not exists (select profile_id from general.account_profiles where profile_id=general.profile_master.profile_id ) ; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=1674981.97..1674981.97 rows=1 width=0) (actual time=14600.386..14600.387 rows=1 loops=1) -> Seq Scanon profile_master (cost=0.00..1674084.16 rows=359125 width=0) (actual time=13.687..13815.798 rows=470386 loops=1) Filter: (NOT (subplan)) SubPlan -> Index Scan using account_profiles_profile_id on account_profiles (cost=0.00..4.59 rows=2 width=4) (actual time=0.013..0.013 rows=0 loops=718250) Index Cond:(profile_id = $0) Total runtime: 14600.531 ms ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to <a class="moz-txt-link-abbreviated"href="mailto:majordomo@postgresql.org">majordomo@postgresql.org</a>) </pre></blockquote><prewrap=""> </pre></blockquote><br />
It is believed that the IN optimization can lead to faster IN times than EXIST times on some queries, the extent of which is still a bit of an unknown. (Incidentally is there an FAQ item on this that needs updating?) Does the not exist query produce worse results in 7.4 than it did in 7.3? Robert Treat On Thu, 2003-11-13 at 02:53, Rajesh Kumar Mallah wrote: > > Hi, > > NOT EXISTS is taking almost double time than NOT IN . > I know IN has been optimised in 7.4 but is anything > wrong with the NOT EXISTS? > > I have vaccumed , analyze and run the query many times > still not in is faster than exists :> > > > Regds > Mallah. > > NOT IN PLAN > > tradein_clients=# explain analyze SELECT count(*) from general.profile_master where > profile_id not in (select profile_id from general.account_profiles ) ; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------------- > Aggregate (cost=32238.19..32238.19 rows=1 width=0) (actual time=5329.206..5329.207 rows=1 loops=1) > -> Seq Scan on profile_master (cost=4458.25..31340.38 rows=359125 width=0) (actual time=1055.496..4637.908 rows=470386loops=1) > Filter: (NOT (hashed subplan)) > SubPlan > -> Seq Scan on account_profiles (cost=0.00..3817.80 rows=256180 width=4) (actual time=0.061..507.811 rows=256180loops=1) > Total runtime: 5337.591 ms > (6 rows) > > > tradein_clients=# explain analyze SELECT count(*) from general.profile_master where not exists > (select profile_id from general.account_profiles where profile_id=general.profile_master.profile_id ) ; > > QUERY PLAN > --------------------------------------------------------------------------------------------------------------------------------------------------------------- > Aggregate (cost=1674981.97..1674981.97 rows=1 width=0) (actual time=14600.386..14600.387 rows=1 loops=1) > -> Seq Scan on profile_master (cost=0.00..1674084.16 rows=359125 width=0) (actual time=13.687..13815.798 rows=470386loops=1) > Filter: (NOT (subplan)) > SubPlan > -> Index Scan using account_profiles_profile_id on account_profiles (cost=0.00..4.59 rows=2 width=4) (actualtime=0.013..0.013 rows=0 loops=718250) > Index Cond: (profile_id = $0) > Total runtime: 14600.531 ms > > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
Robert Treat <xzilla@users.sourceforge.net> writes: > Does the not exist query produce worse results in 7.4 than it did in > 7.3? EXISTS should work the same as before. regards, tom lane
On Thu, 2003-11-13 at 12:00, Tom Lane wrote: > Robert Treat <xzilla@users.sourceforge.net> writes: > > Does the not exist query produce worse results in 7.4 than it did in > > 7.3? > > EXISTS should work the same as before. > right. the original poster is asking if there is "something wrong with exist" based on the comparison to IN, he needs to compare it vs. 7.3 EXISTS to determine if something is wrong. Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
Tom Lane wrote:
Not in this case :) , did i miss something silly?
tradein_clients=# explain SELECT count(*) from user_accounts where email is not null and email not in
(select email from profile_master where email is not null) ;
QUERY PLAN
--------------------------------------------------------------------------------------
Aggregate (cost=9587726326.93..9587726326.93 rows=1 width=0)
-> Seq Scan on user_accounts (cost=0.00..9587725473.40 rows=341412 width=0)
Filter: ((email IS NOT NULL) AND (NOT (subplan)))
SubPlan
-> Seq Scan on profile_master (cost=0.00..25132.24 rows=674633 width=25)
Filter: (email IS NOT NULL)
(6 rows)
The query above does not return
tradein_clients=# explain analyze SELECT count(*) from user_accounts where email is not null and
not exists (select email from profile_master where email=user_accounts.email) ;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=2850847.55..2850847.55 rows=1 width=0) (actual time=34075.100..34075.101 rows=1 loops=1)
-> Seq Scan on user_accounts (cost=0.00..2849994.02 rows=341412 width=0) (actual time=8.066..34066.329 rows=3882 loops=1)
Filter: ((email IS NOT NULL) AND (NOT (subplan)))
SubPlan
-> Index Scan using profile_master_email on profile_master (cost=0.00..35.60 rows=9 width=25) (actual time=0.044..0.044 rows=1 loops=686716)
Index Cond: ((email)::text = ($0)::text)
Total runtime: 34075.213 ms
(7 rows)
tradein_clients=#
Rajesh Kumar Mallah <mallah@trade-india.com> writes:NOT EXISTS is taking almost double time than NOT IN . I know IN has been optimised in 7.4 but is anything wrong with the NOT EXISTS?That's the expected behavior in 7.4. EXISTS in the style you are using it effectively forces a nestloop-with-inner-indexscan implementation. As of 7.4, IN can do that, but it can do several other things too, including the hash-type plan you have here. So assuming that the planner chooses the right plan choice (not always a given ;-))
IN should be as fast or faster than EXISTS in all cases.
Not in this case :) , did i miss something silly?
tradein_clients=# explain SELECT count(*) from user_accounts where email is not null and email not in
(select email from profile_master where email is not null) ;
QUERY PLAN
--------------------------------------------------------------------------------------
Aggregate (cost=9587726326.93..9587726326.93 rows=1 width=0)
-> Seq Scan on user_accounts (cost=0.00..9587725473.40 rows=341412 width=0)
Filter: ((email IS NOT NULL) AND (NOT (subplan)))
SubPlan
-> Seq Scan on profile_master (cost=0.00..25132.24 rows=674633 width=25)
Filter: (email IS NOT NULL)
(6 rows)
The query above does not return
tradein_clients=# explain analyze SELECT count(*) from user_accounts where email is not null and
not exists (select email from profile_master where email=user_accounts.email) ;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=2850847.55..2850847.55 rows=1 width=0) (actual time=34075.100..34075.101 rows=1 loops=1)
-> Seq Scan on user_accounts (cost=0.00..2849994.02 rows=341412 width=0) (actual time=8.066..34066.329 rows=3882 loops=1)
Filter: ((email IS NOT NULL) AND (NOT (subplan)))
SubPlan
-> Index Scan using profile_master_email on profile_master (cost=0.00..35.60 rows=9 width=25) (actual time=0.044..0.044 rows=1 loops=686716)
Index Cond: ((email)::text = ($0)::text)
Total runtime: 34075.213 ms
(7 rows)
tradein_clients=#
regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
-- Rajesh Kumar Mallah, Infocom Network Limited, New Delhi phone: +91(11)6152172 (221) (L) ,9811255597 (M) Visit http://www.trade-india.com , India's Leading B2B eMarketplace.