Thread: Postgresql sql query - selecting rows outside a join

Postgresql sql query - selecting rows outside a join

From
Graham Leggett
Date:
Hi all,

I am trying to do a query that returns  all rows that are _not_ part of
a join, and so far I cannot seem to find a query that doesn't take 30
minutes or more to run.

The basic query is "select * from tableA where tableA_id NOT IN (select
tableA_id from tableB)".

Is there a more efficient way of doing this?

Regards,
Graham
--




Re: Postgresql sql query - selecting rows outside a join

From
Pavel Stehule
Date:
Hello



On Mon, 1 Dec 2003, Graham Leggett wrote:

> Hi all,
>
> I am trying to do a query that returns  all rows that are _not_ part of
> a join, and so far I cannot seem to find a query that doesn't take 30
> minutes or more to run.
>
> The basic query is "select * from tableA where tableA_id NOT IN (select
> tableA_id from tableB)".
>
> Is there a more efficient way of doing this?
>
> Regards,
> Graham
> --
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faqs/FAQ.html
>


Re: Postgresql sql query - selecting rows outside a join

From
"Arjen van der Meijden"
Date:
> Graham Leggett
>
> Hi all,
>
> I am trying to do a query that returns  all rows that are
> _not_ part of a join, and so far I cannot seem to find a
> query that doesn't take 30 minutes or more to run.
>
> The basic query is "select * from tableA where tableA_id NOT
> IN (select tableA_id from tableB)".
>
> Is there a more efficient way of doing this?
Mysql's version to do something like that is, afaik:

SELECT * FROM tableA LEFT JOIN tableB USING(tableA_id) WHERE
tableB.tableA_id IS NULL;

Perhaps that makes more efficient use of your indices.
Another version is something like:
SELECT *, COUNT(tableB.*) FROM tableA LEFT JOIN tableB USING(tableA_id)
GROUP BY columns_of_tableA HAVING count(tableB.*) = 0;

And perhaps a rewrite to use EXISTS (although that is claimed to be
similar in speed or even slower as of postgres 7.4) is useful:
SELECT * FROM tableA WHERE NOT EXISTS(SELECT * FROM tableB WHERE
tableB.tableA_id = tableA.tableA_id)

There are a few others, but it all depens on your index structure and
table sizes whether they work more efficient or not.

Best regards,

Arjen




Re: Postgresql sql query - selecting rows outside a join

From
Pavel Stehule
Date:
Hello

If you can, use PostgreSQL version 7.4. There is some optimalisation for
this questions. You can change your query from select .. IN (select) to
select .. exists (select). More about it you can find in FAQ.

regards
Pavel

On Mon, 1 Dec 2003, Graham Leggett wrote:

> Hi all,
>
> I am trying to do a query that returns  all rows that are _not_ part of
> a join, and so far I cannot seem to find a query that doesn't take 30
> minutes or more to run.
>
> The basic query is "select * from tableA where tableA_id NOT IN (select
> tableA_id from tableB)".
>
> Is there a more efficient way of doing this?
>
> Regards,
> Graham
> --
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faqs/FAQ.html
>


Re: Postgresql sql query - selecting rows outside a join

From
"Toby Doig"
Date:
A suggestion is to use a left outer join and then test for rows with a
right-hand null value (ones where the join failed).

The following example creates 2 tables, t1 and t2.
t1 has related records in t2 and the relation is indicated by
t1.rel->t2.id

create table t1 (
id integer,
rel integer,
label varchar(10)
);

create table t2 (
id integer,
label varchar(10)
);

insert into t1 (id, rel, label) values (1, 1, 'bob');
insert into t1 (id, rel, label) values (2, 2, 'sam');
insert into t1 (id, rel, label) values (3, 0, 'alice');

insert into t2 (id, label) values (1, 'martin');
insert into t2 (id, label) values (2, 'gwen');

-- this shows you all the records
select t1.*, t2.* from t1 left outer join t2 on (t1.rel = t2.id);

-- this shows you those where the inner join fails (your where X NOT IN
(y) stuff)
-- the result should be where t1.id=3 because it has a t1.rel of 0
therefore no related
-- record in t2
select t1.*, t2.* from t1 left outer join t2 on (t1.rel = t2.id) where
t2.id is null

Toby
-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Graham Leggett
Sent: 01 December 2003 17:16
To: pgsql-general@postgresql.org
Subject: [GENERAL] Postgresql sql query - selecting rows outside a join

Hi all,

I am trying to do a query that returns  all rows that are _not_ part of
a join, and so far I cannot seem to find a query that doesn't take 30
minutes or more to run.

The basic query is "select * from tableA where tableA_id NOT IN (select
tableA_id from tableB)".

Is there a more efficient way of doing this?

Regards,
Graham
--




---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faqs/FAQ.html

Re: Postgresql sql query - selecting rows outside a join

From
Adam Ruth
Date:
On Dec 1, 2003, at 10:15 AM, Graham Leggett wrote:

> Hi all,
>
> I am trying to do a query that returns  all rows that are _not_ part of
> a join, and so far I cannot seem to find a query that doesn't take 30
> minutes or more to run.
>
> The basic query is "select * from tableA where tableA_id NOT IN (select
> tableA_id from tableB)".
>
> Is there a more efficient way of doing this?
>
> Regards,
> Graham
> --
>
This should be more efficient:

select * from tableA where not exists (select null from tableB where
tableA_id = tableA.tableA_id);

Version 7.4 fixes the slow IN - NOT IN, but I haven't tested it myself
yet to see how it compares to this.

Adam Ruth