Thread: Postgresql sql query - selecting rows outside a join
Hi all, I am trying to do a query that returns all rows that are _not_ part of a join, and so far I cannot seem to find a query that doesn't take 30 minutes or more to run. The basic query is "select * from tableA where tableA_id NOT IN (select tableA_id from tableB)". Is there a more efficient way of doing this? Regards, Graham --
Hello On Mon, 1 Dec 2003, Graham Leggett wrote: > Hi all, > > I am trying to do a query that returns all rows that are _not_ part of > a join, and so far I cannot seem to find a query that doesn't take 30 > minutes or more to run. > > The basic query is "select * from tableA where tableA_id NOT IN (select > tableA_id from tableB)". > > Is there a more efficient way of doing this? > > Regards, > Graham > -- > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html >
> Graham Leggett > > Hi all, > > I am trying to do a query that returns all rows that are > _not_ part of a join, and so far I cannot seem to find a > query that doesn't take 30 minutes or more to run. > > The basic query is "select * from tableA where tableA_id NOT > IN (select tableA_id from tableB)". > > Is there a more efficient way of doing this? Mysql's version to do something like that is, afaik: SELECT * FROM tableA LEFT JOIN tableB USING(tableA_id) WHERE tableB.tableA_id IS NULL; Perhaps that makes more efficient use of your indices. Another version is something like: SELECT *, COUNT(tableB.*) FROM tableA LEFT JOIN tableB USING(tableA_id) GROUP BY columns_of_tableA HAVING count(tableB.*) = 0; And perhaps a rewrite to use EXISTS (although that is claimed to be similar in speed or even slower as of postgres 7.4) is useful: SELECT * FROM tableA WHERE NOT EXISTS(SELECT * FROM tableB WHERE tableB.tableA_id = tableA.tableA_id) There are a few others, but it all depens on your index structure and table sizes whether they work more efficient or not. Best regards, Arjen
Hello If you can, use PostgreSQL version 7.4. There is some optimalisation for this questions. You can change your query from select .. IN (select) to select .. exists (select). More about it you can find in FAQ. regards Pavel On Mon, 1 Dec 2003, Graham Leggett wrote: > Hi all, > > I am trying to do a query that returns all rows that are _not_ part of > a join, and so far I cannot seem to find a query that doesn't take 30 > minutes or more to run. > > The basic query is "select * from tableA where tableA_id NOT IN (select > tableA_id from tableB)". > > Is there a more efficient way of doing this? > > Regards, > Graham > -- > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html >
A suggestion is to use a left outer join and then test for rows with a right-hand null value (ones where the join failed). The following example creates 2 tables, t1 and t2. t1 has related records in t2 and the relation is indicated by t1.rel->t2.id create table t1 ( id integer, rel integer, label varchar(10) ); create table t2 ( id integer, label varchar(10) ); insert into t1 (id, rel, label) values (1, 1, 'bob'); insert into t1 (id, rel, label) values (2, 2, 'sam'); insert into t1 (id, rel, label) values (3, 0, 'alice'); insert into t2 (id, label) values (1, 'martin'); insert into t2 (id, label) values (2, 'gwen'); -- this shows you all the records select t1.*, t2.* from t1 left outer join t2 on (t1.rel = t2.id); -- this shows you those where the inner join fails (your where X NOT IN (y) stuff) -- the result should be where t1.id=3 because it has a t1.rel of 0 therefore no related -- record in t2 select t1.*, t2.* from t1 left outer join t2 on (t1.rel = t2.id) where t2.id is null Toby -----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Graham Leggett Sent: 01 December 2003 17:16 To: pgsql-general@postgresql.org Subject: [GENERAL] Postgresql sql query - selecting rows outside a join Hi all, I am trying to do a query that returns all rows that are _not_ part of a join, and so far I cannot seem to find a query that doesn't take 30 minutes or more to run. The basic query is "select * from tableA where tableA_id NOT IN (select tableA_id from tableB)". Is there a more efficient way of doing this? Regards, Graham -- ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
On Dec 1, 2003, at 10:15 AM, Graham Leggett wrote: > Hi all, > > I am trying to do a query that returns all rows that are _not_ part of > a join, and so far I cannot seem to find a query that doesn't take 30 > minutes or more to run. > > The basic query is "select * from tableA where tableA_id NOT IN (select > tableA_id from tableB)". > > Is there a more efficient way of doing this? > > Regards, > Graham > -- > This should be more efficient: select * from tableA where not exists (select null from tableB where tableA_id = tableA.tableA_id); Version 7.4 fixes the slow IN - NOT IN, but I haven't tested it myself yet to see how it compares to this. Adam Ruth