Thread: IN surpasses NOT EXISTS in 7.4RC2 ??

IN surpasses NOT EXISTS in 7.4RC2 ??

From
Rajesh Kumar Mallah
Date:
Hi,

NOT EXISTS is taking almost double time than NOT IN .
I know IN has been optimised in 7.4 but is anything
wrong with the NOT EXISTS?

I have vaccumed , analyze and run the query many times
still not in is faster than exists :>


Regds
Mallah.

NOT IN PLAN

tradein_clients=# explain analyze SELECT  count(*) from general.profile_master where
 profile_id not in (select  profile_id from general.account_profiles ) ;
                                                             QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------
Aggregate  (cost=32238.19..32238.19 rows=1 width=0) (actual time=5329.206..5329.207 rows=1 loops=1)
  ->  Seq Scan on profile_master  (cost=4458.25..31340.38 rows=359125 width=0) (actual time=1055.496..4637.908
rows=470386loops=1) 
        Filter: (NOT (hashed subplan))
        SubPlan
          ->  Seq Scan on account_profiles  (cost=0.00..3817.80 rows=256180 width=4) (actual time=0.061..507.811
rows=256180loops=1) 
Total runtime: 5337.591 ms
(6 rows)


tradein_clients=# explain analyze SELECT  count(*) from general.profile_master where not exists
(select  profile_id from general.account_profiles where profile_id=general.profile_master.profile_id ) ;

                                                                          QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate  (cost=1674981.97..1674981.97 rows=1 width=0) (actual time=14600.386..14600.387 rows=1 loops=1)
  ->  Seq Scan on profile_master  (cost=0.00..1674084.16 rows=359125 width=0) (actual time=13.687..13815.798
rows=470386loops=1) 
        Filter: (NOT (subplan))
        SubPlan
          ->  Index Scan using account_profiles_profile_id on account_profiles  (cost=0.00..4.59 rows=2 width=4)
(actualtime=0.013..0.013 rows=0 loops=718250) 
                Index Cond: (profile_id = $0)
Total runtime: 14600.531 ms



Re: IN surpasses NOT EXISTS in 7.4RC2 ??

From
Tom Lane
Date:
Rajesh Kumar Mallah <mallah@trade-india.com> writes:
> NOT EXISTS is taking almost double time than NOT IN .
> I know IN has been optimised in 7.4 but is anything
> wrong with the NOT EXISTS?

That's the expected behavior in 7.4.  EXISTS in the style you are using
it effectively forces a nestloop-with-inner-indexscan implementation.
As of 7.4, IN can do that, but it can do several other things too,
including the hash-type plan you have here.  So assuming that the
planner chooses the right plan choice (not always a given ;-))
IN should be as fast or faster than EXISTS in all cases.

            regards, tom lane

Re: IN surpasses NOT EXISTS in 7.4RC2 ??

From
Bruce Momjian
Date:
Rajesh Kumar Mallah wrote:
>
> Hi,
>
> NOT EXISTS is taking almost double time than NOT IN .
> I know IN has been optimised in 7.4 but is anything
> wrong with the NOT EXISTS?
>
> I have vaccumed , analyze and run the query many times
> still not in is faster than exists :>

Seems fine.  In 7.4, NOT IN will often be faster that NOT EXISTS.  NOT
EXISTS didn't change --- there are restrictions on how far we can
optimize NOT EXISTS.  NOT IN has just become much faster in 7.4.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: IN surpasses NOT EXISTS in 7.4RC2 ??

From
Rajesh Kumar Mallah
Date:
Robert Treat wrote:<br /><blockquote cite="mid1068737502.10946.18338.camel@camel" type="cite"><pre wrap="">It is
believedthat the IN optimization can lead to faster IN times than
 
EXIST times on some queries, the extent of which is still a bit of an
unknown. (Incidentally is there an FAQ item on this that needs
updating?) </pre></blockquote><br /> Thanks every one for clarifying. Its really a nice thing to see IN working<br />
sowell becoz its easier to read the SQL using IN. <br /><br /> looks like NOT IN is indifferent to indexes where is IN
usesindexes , is it true?<br /><br /> does indexes affect the new manner in which IN works in 7.4 ?<br /><br /><br
/><br/><br /><br /><blockquote cite="mid1068737502.10946.18338.camel@camel" type="cite"><pre wrap="">
 
Does the not exist query produce worse results in 7.4 than it did in
7.3?</pre></blockquote> Will surely post the overvation sometime.<br /><br /><br /><br /> Regards<br /> Mallah.<br
/><br/><br /><blockquote cite="mid1068737502.10946.18338.camel@camel" type="cite"><pre wrap="">
 

Robert Treat

On Thu, 2003-11-13 at 02:53, Rajesh Kumar Mallah wrote: </pre><blockquote type="cite"><pre wrap="">Hi,

NOT EXISTS is taking almost double time than NOT IN .
I know IN has been optimised in 7.4 but is anything 
wrong with the NOT EXISTS?

I have vaccumed , analyze and run the query many times
still not in is faster than exists :>


Regds
Mallah.

NOT IN PLAN

tradein_clients=# explain analyze SELECT  count(*) from general.profile_master whereprofile_id not in (select
profile_idfrom general.account_profiles ) ;                                                            QUERY PLAN
 

-------------------------------------------------------------------------------------------------------------------------------------
Aggregate  (cost=32238.19..32238.19 rows=1 width=0) (actual time=5329.206..5329.207 rows=1 loops=1) ->  Seq Scan on
profile_master (cost=4458.25..31340.38 rows=359125 width=0) (actual time=1055.496..4637.908 rows=470386 loops=1)
Filter:(NOT (hashed subplan))       SubPlan         ->  Seq Scan on account_profiles  (cost=0.00..3817.80
rows=256180width=4) (actual time=0.061..507.811 rows=256180 loops=1)
 
Total runtime: 5337.591 ms
(6 rows)


tradein_clients=# explain analyze SELECT  count(*) from general.profile_master where not exists 
(select  profile_id from general.account_profiles where profile_id=general.profile_master.profile_id ) ;
                                                                         QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate  (cost=1674981.97..1674981.97 rows=1 width=0) (actual time=14600.386..14600.387 rows=1 loops=1) ->  Seq
Scanon profile_master  (cost=0.00..1674084.16 rows=359125 width=0) (actual time=13.687..13815.798 rows=470386 loops=1)
    Filter: (NOT (subplan))       SubPlan         ->  Index Scan using account_profiles_profile_id on
account_profiles (cost=0.00..4.59 rows=2 width=4) (actual time=0.013..0.013 rows=0 loops=718250)               Index
Cond:(profile_id = $0)
 
Total runtime: 14600.531 ms



---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command   (send "unregister YourEmailAddressHere" to <a
class="moz-txt-link-abbreviated"href="mailto:majordomo@postgresql.org">majordomo@postgresql.org</a>)
</pre></blockquote><prewrap=""> </pre></blockquote><br /> 

Re: IN surpasses NOT EXISTS in 7.4RC2 ??

From
Robert Treat
Date:
It is believed that the IN optimization can lead to faster IN times than
EXIST times on some queries, the extent of which is still a bit of an
unknown. (Incidentally is there an FAQ item on this that needs
updating?)

Does the not exist query produce worse results in 7.4 than it did in
7.3?

Robert Treat

On Thu, 2003-11-13 at 02:53, Rajesh Kumar Mallah wrote:
>
> Hi,
>
> NOT EXISTS is taking almost double time than NOT IN .
> I know IN has been optimised in 7.4 but is anything
> wrong with the NOT EXISTS?
>
> I have vaccumed , analyze and run the query many times
> still not in is faster than exists :>
>
>
> Regds
> Mallah.
>
> NOT IN PLAN
>
> tradein_clients=# explain analyze SELECT  count(*) from general.profile_master where
>  profile_id not in (select  profile_id from general.account_profiles ) ;
>                                                              QUERY PLAN
>
-------------------------------------------------------------------------------------------------------------------------------------
> Aggregate  (cost=32238.19..32238.19 rows=1 width=0) (actual time=5329.206..5329.207 rows=1 loops=1)
>   ->  Seq Scan on profile_master  (cost=4458.25..31340.38 rows=359125 width=0) (actual time=1055.496..4637.908
rows=470386loops=1) 
>         Filter: (NOT (hashed subplan))
>         SubPlan
>           ->  Seq Scan on account_profiles  (cost=0.00..3817.80 rows=256180 width=4) (actual time=0.061..507.811
rows=256180loops=1) 
> Total runtime: 5337.591 ms
> (6 rows)
>
>
> tradein_clients=# explain analyze SELECT  count(*) from general.profile_master where not exists
> (select  profile_id from general.account_profiles where profile_id=general.profile_master.profile_id ) ;
>
>                                                                           QUERY PLAN
>
---------------------------------------------------------------------------------------------------------------------------------------------------------------
> Aggregate  (cost=1674981.97..1674981.97 rows=1 width=0) (actual time=14600.386..14600.387 rows=1 loops=1)
>   ->  Seq Scan on profile_master  (cost=0.00..1674084.16 rows=359125 width=0) (actual time=13.687..13815.798
rows=470386loops=1) 
>         Filter: (NOT (subplan))
>         SubPlan
>           ->  Index Scan using account_profiles_profile_id on account_profiles  (cost=0.00..4.59 rows=2 width=4)
(actualtime=0.013..0.013 rows=0 loops=718250) 
>                 Index Cond: (profile_id = $0)
> Total runtime: 14600.531 ms
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

--
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL


Re: IN surpasses NOT EXISTS in 7.4RC2 ??

From
Tom Lane
Date:
Robert Treat <xzilla@users.sourceforge.net> writes:
> Does the not exist query produce worse results in 7.4 than it did in
> 7.3?

EXISTS should work the same as before.

            regards, tom lane

Re: IN surpasses NOT EXISTS in 7.4RC2 ??

From
Robert Treat
Date:
On Thu, 2003-11-13 at 12:00, Tom Lane wrote:
> Robert Treat <xzilla@users.sourceforge.net> writes:
> > Does the not exist query produce worse results in 7.4 than it did in
> > 7.3?
>
> EXISTS should work the same as before.
>

right. the original poster is asking if there is "something wrong with
exist" based on the comparison to IN, he needs to compare it vs. 7.3
EXISTS to determine if something is wrong.

Robert Treat
--
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL


Re: IN surpasses NOT EXISTS in 7.4RC2 ??

From
Rajesh Kumar Mallah
Date:
Tom Lane wrote:
Rajesh Kumar Mallah <mallah@trade-india.com> writes: 
NOT EXISTS is taking almost double time than NOT IN .
I know IN has been optimised in 7.4 but is anything 
wrong with the NOT EXISTS?   
That's the expected behavior in 7.4.  EXISTS in the style you are using
it effectively forces a nestloop-with-inner-indexscan implementation.
As of 7.4, IN can do that, but it can do several other things too,
including the hash-type plan you have here.  So assuming that the
planner chooses the right plan choice (not always a given ;-))

IN should be as fast or faster than EXISTS in all cases.

Not in this case :) , did i miss something silly?

tradein_clients=# explain  SELECT count(*) from user_accounts where email is not null and email not in
 (select email from profile_master where email is not null) ;
                                      QUERY PLAN
--------------------------------------------------------------------------------------
 Aggregate  (cost=9587726326.93..9587726326.93 rows=1 width=0)
   ->  Seq Scan on user_accounts  (cost=0.00..9587725473.40 rows=341412 width=0)
         Filter: ((email IS NOT NULL) AND (NOT (subplan)))
         SubPlan
           ->  Seq Scan on profile_master  (cost=0.00..25132.24 rows=674633 width=25)
                 Filter: (email IS NOT NULL)
(6 rows)


The query above does not return

tradein_clients=# explain analyze SELECT count(*) from user_accounts where email is not null and
not exists (select email from profile_master where email=user_accounts.email) ;
                                                                        QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=2850847.55..2850847.55 rows=1 width=0) (actual time=34075.100..34075.101 rows=1 loops=1)
   ->  Seq Scan on user_accounts  (cost=0.00..2849994.02 rows=341412 width=0) (actual time=8.066..34066.329 rows=3882 loops=1)
         Filter: ((email IS NOT NULL) AND (NOT (subplan)))
         SubPlan
           ->  Index Scan using profile_master_email on profile_master  (cost=0.00..35.60 rows=9 width=25) (actual time=0.044..0.044 rows=1 loops=686716)
                 Index Cond: ((email)::text = ($0)::text)
 Total runtime: 34075.213 ms
(7 rows)

tradein_clients=#


		regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org 


-- 

Rajesh Kumar Mallah,
Infocom Network Limited, New Delhi
phone: +91(11)6152172 (221) (L) ,9811255597 (M)

Visit http://www.trade-india.com ,
India's Leading B2B eMarketplace.