Re: in() VS exists() - Mailing list pgsql-sql

From Julien
Subject Re: in() VS exists()
Date
Msg-id 1205425663.3240.35.camel@frodon.be-bif.ulb.ac.be
Whole thread Raw
In response to Re: in() VS exists()  ("Bart Degryse" <Bart.Degryse@indicator.be>)
Responses Re: in() VS exists()  ("Bart Degryse" <Bart.Degryse@indicator.be>)
List pgsql-sql
mmh no because it's a one to many relation (a specimen can have more
than one sequence) :

muridae=> select count(sp.id) from specimens sp INNER JOIN sequences s
on s.specimen_id = sp.id;count 
------- 1536
(1 row)

Time: 81.242 ms
muridae=> select count(sp.id) from specimens sp where sp.id in (select
specimen_id from sequences group by specimen_id);count 
------- 1431
(1 row)

Time: 81.736 ms
muridae=> 

(of course this is a bad example, because I could just do: select
count(specimen_id) from sequences group by specimen_id;, but in my
application I have more fields coming from specimens of course)

Julien

On Thu, 2008-03-13 at 15:12 +0100, Bart Degryse wrote:
> I think that just
> select count(sp.id) from specimens sp INNER JOIN sequences s on
> s.specimen_id = sp.id;
> should be enough
> 
> >>> Julien <jcigar@ulb.ac.be> 2008-03-13 17:10 >>>
> If I understood well the query plan, the planner optimize the
> IN(SELECT ...) version with a JOIN (line 19-20 of the first paste) :
> 
> ->  Hash IN Join  (cost=240.95..4011.20 rows=1436 width=4) (actual
> time=93.971..201.908 rows=1431 loops=1)
>          Hash Cond: ("outer".id = "inner".specimen_id)
> 
> so I guess that :
> 
> select count(sp.id) from specimens sp where sp.id in (select
> specimen_id
> from sequences);
> 
> is almost the same as :
> 
> select count(sp.id) from specimens sp INNER JOIN (select specimen_id
> from sequences GROUP BY specimen_id) as foo on foo.specimen_id =
> sp.id;
> 
> ?
> 
> Thanks,
> Julien
> 
> On Thu, 2008-03-13 at 14:46 +0100, Bart Degryse wrote:
> > The chapter on indexes in the manual should give you a pretty good
> > idea on the why.
> > IN and EXISTS are not the only possibilities, you can also use inner
> > or outer joins.
> > Which solution performs best depends on the data, the database
> > version, the available indexes, ...
> > 
> > >>> Julien <jcigar@ulb.ac.be> 2008-03-13 15:47 >>>
> > Hello,
> > 
> > Does anyone has an idea why sometimes:
> > - select ... where ... in (select ...)
> > is faster than :
> > - select ... where ... exists(select ...)
> > and sometimes it's the opposite ?
> > 
> > I had such a situation, I've pasted the queries on:
> > http://rafb.net/p/KXNZ6892.html and http://rafb.net/p/jvo5DO38.html
> > 
> > It's running PostgreSQL 8.1 with an effective_cache_size of 30000.
> > 
> > specimens.id is the primary key and there are indexes on
> > sequences(specimen_id) and specimen_measurements(specimen_id)
> > 
> > Is there a general "rule" to know when to use the in() version and
> > when
> > to use the exists() version ? Is it true to say that the exists()
> > version is more scalable (with many rows) than the in() version
> (from
> > the little tests I made it seems the case) ?
> > 
> > Thanks,
> > Julien
> > 
> > -- 
> > Julien Cigar
> > Belgian Biodiversity Platform
> > http://www.biodiversity.be
> > Université Libre de Bruxelles (ULB)
> > Campus de la Plaine CP 257
> > Bâtiment NO, Bureau 4 N4 115C (Niveau 4)
> > Boulevard du Triomphe, entrée ULB 2
> > B-1050 Bruxelles
> > Mail: jcigar@ulb.ac.be
> > @biobel: http://biobel.biodiversity.be/person/show/471
> > Tel : 02 650 57 52
> > 
> > 
> > -- 
> > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-sql
> > 
> -- 
> Julien Cigar
> Belgian Biodiversity Platform
> http://www.biodiversity.be
> Université Libre de Bruxelles (ULB)
> Campus de la Plaine CP 257
> Bâtiment NO, Bureau 4 N4 115C (Niveau 4)
> Boulevard du Triomphe, entrée ULB 2
> B-1050 Bruxelles
> Mail: jcigar@ulb.ac.be
> @biobel: http://biobel.biodiversity.be/person/show/471
> Tel : 02 650 57 52
> 
> 
> -- 
> Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql
> 
-- 
Julien Cigar
Belgian Biodiversity Platform
http://www.biodiversity.be
Université Libre de Bruxelles (ULB)
Campus de la Plaine CP 257
Bâtiment NO, Bureau 4 N4 115C (Niveau 4)
Boulevard du Triomphe, entrée ULB 2
B-1050 Bruxelles
Mail: jcigar@ulb.ac.be
@biobel: http://biobel.biodiversity.be/person/show/471
Tel : 02 650 57 52



pgsql-sql by date:

Previous
From: "Bart Degryse"
Date:
Subject: Re: in() VS exists()
Next
From: "Bart Degryse"
Date:
Subject: Re: in() VS exists()