Re: in() VS exists() - Mailing list pgsql-sql
From | Julien |
---|---|
Subject | Re: in() VS exists() |
Date | |
Msg-id | 1205427041.3240.44.camel@frodon.be-bif.ulb.ac.be Whole thread Raw |
In response to | Re: in() VS exists() ("Bart Degryse" <Bart.Degryse@indicator.be>) |
Responses |
Re: in() VS exists()
|
List | pgsql-sql |
>From my experience I tend to avoid SELECT DISTINCT queries because it's usually very slow with many rows ... For my specific case the result is the same: muridae=> select count(distinct s.specimen_id) from specimens sp INNER JOIN sequences s on s.specimen_id = sp.id;count ------- 1431 (1 row) Time: 65.351 ms muridae=> select count(sp.id) from specimens sp where sp.id in (select specimen_id from sequences group by specimen_id);count ------- 1431 (1 row) Time: 66.371 ms But to give an example, I have a table with ~1 000 000 rows where the DISTINCT solution is more than 10 times slower : muridae=> select count(distinct sp.id) from specimens sp INNER JOIN specimen_measurements m ON m.specimen_id = sp.id;count -------75241 (1 row) Time: 15970.668 ms muridae=> select count(sp.id) from specimens sp INNER JOIN (select specimen_id from specimen_measurements GROUP BY specimen_id) as foo on foo.specimen_id = sp.id;count -------75241 (1 row) Time: 1165.487 ms Regards, Julien On Thu, 2008-03-13 at 15:28 +0100, Bart Degryse wrote: > how about > select count(distinct s.specimen_id) from specimens sp INNER JOIN > sequences s > on s.specimen_id = sp.id; > > > >>> Julien <jcigar@ulb.ac.be> 2008-03-13 17:27 >>> > mmh no because it's a one to many relation (a specimen can have more > than one sequence) : > > muridae=> select count(sp.id) from specimens sp INNER JOIN sequences s > on s.specimen_id = sp.id; > count > ------- > 1536 > (1 row) > > Time: 81.242 ms > muridae=> select count(sp.id) from specimens sp where sp.id in (select > specimen_id from sequences group by specimen_id); > count > ------- > 1431 > (1 row) > > Time: 81.736 ms > muridae=> > > (of course this is a bad example, because I could just do: select > count(specimen_id) from sequences group by specimen_id;, but in my > application I have more fields coming from specimens of course) > > Julien > > On Thu, 2008-03-13 at 15:12 +0100, Bart Degryse wrote: > > I think that just > > select count(sp.id) from specimens sp INNER JOIN sequences s on > > s.specimen_id = sp.id; > > should be enough > > > > >>> Julien <jcigar@ulb.ac.be> 2008-03-13 17:10 >>> > > If I understood well the query plan, the planner optimize the > > IN(SELECT ...) version with a JOIN (line 19-20 of the first paste) : > > > > -> Hash IN Join (cost=240.95..4011.20 rows=1436 width=4) (actual > > time=93.971..201.908 rows=1431 loops=1) > > Hash Cond: ("outer".id = "inner".specimen_id) > > > > so I guess that : > > > > select count(sp.id) from specimens sp where sp.id in (select > > specimen_id > > from sequences); > > > > is almost the same as : > > > > select count(sp.id) from specimens sp INNER JOIN (select specimen_id > > from sequences GROUP BY specimen_id) as foo on foo.specimen_id = > > sp.id; > > > > ? > > > > Thanks, > > Julien > > > > On Thu, 2008-03-13 at 14:46 +0100, Bart Degryse wrote: > > > The chapter on indexes in the manual should give you a pretty good > > > idea on the why. > > > IN and EXISTS are not the only possibilities, you can also use > inner > > > or outer joins. > > > Which solution performs best depends on the data, the database > > > version, the available indexes, ... > > > > > > >>> Julien <jcigar@ulb.ac.be> 2008-03-13 15:47 >>> > > > Hello, > > > > > > Does anyone has an idea why sometimes: > > > - select ... where ... in (select ...) > > > is faster than : > > > - select ... where ... exists(select ...) > > > and sometimes it's the opposite ? > > > > > > I had such a situation, I've pasted the queries on: > > > http://rafb.net/p/KXNZ6892.html and > http://rafb.net/p/jvo5DO38.html > > > > > > It's running PostgreSQL 8.1 with an effective_cache_size of 30000. > > > > > > specimens.id is the primary key and there are indexes on > > > sequences(specimen_id) and specimen_measurements(specimen_id) > > > > > > Is there a general "rule" to know when to use the in() version and > > > when > > > to use the exists() version ? Is it true to say that the exists() > > > version is more scalable (with many rows) than the in() version > > (from > > > the little tests I made it seems the case) ? > > > > > > Thanks, > > > Julien > > > > > > -- > > > Julien Cigar > > > Belgian Biodiversity Platform > > > http://www.biodiversity.be > > > Université Libre de Bruxelles (ULB) > > > Campus de la Plaine CP 257 > > > Bâtiment NO, Bureau 4 N4 115C (Niveau 4) > > > Boulevard du Triomphe, entrée ULB 2 > > > B-1050 Bruxelles > > > Mail: jcigar@ulb.ac.be > > > @biobel: http://biobel.biodiversity.be/person/show/471 > > > Tel : 02 650 57 52 > > > > > > > > > -- > > > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) > > > To make changes to your subscription: > > > http://www.postgresql.org/mailpref/pgsql-sql > > > > > -- > > Julien Cigar > > Belgian Biodiversity Platform > > http://www.biodiversity.be > > Université Libre de Bruxelles (ULB) > > Campus de la Plaine CP 257 > > Bâtiment NO, Bureau 4 N4 115C (Niveau 4) > > Boulevard du Triomphe, entrée ULB 2 > > B-1050 Bruxelles > > Mail: jcigar@ulb.ac.be > > @biobel: http://biobel.biodiversity.be/person/show/471 > > Tel : 02 650 57 52 > > > > > > -- > > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) > > To make changes to your subscription: > > http://www.postgresql.org/mailpref/pgsql-sql > > > -- > Julien Cigar > Belgian Biodiversity Platform > http://www.biodiversity.be > Université Libre de Bruxelles (ULB) > Campus de la Plaine CP 257 > Bâtiment NO, Bureau 4 N4 115C (Niveau 4) > Boulevard du Triomphe, entrée ULB 2 > B-1050 Bruxelles > Mail: jcigar@ulb.ac.be > @biobel: http://biobel.biodiversity.be/person/show/471 > Tel : 02 650 57 52 > > > -- > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-sql > -- Julien Cigar Belgian Biodiversity Platform http://www.biodiversity.be Université Libre de Bruxelles (ULB) Campus de la Plaine CP 257 Bâtiment NO, Bureau 4 N4 115C (Niveau 4) Boulevard du Triomphe, entrée ULB 2 B-1050 Bruxelles Mail: jcigar@ulb.ac.be @biobel: http://biobel.biodiversity.be/person/show/471 Tel : 02 650 57 52