Re: dumb question - Mailing list pgsql-general

From David G. Johnston
Subject Re: dumb question
Date
Msg-id CAKFQuwYgCb6+v8T82yunGiAmwstSDgwDYTdajXFB-9VHannPFw@mail.gmail.com
Whole thread Raw
In response to Re: dumb question  (Kevin Grittner <kgrittn@gmail.com>)
List pgsql-general
On Thu, Jun 2, 2016 at 5:44 PM, Kevin Grittner <kgrittn@gmail.com> wrote:
On Thu, Jun 2, 2016 at 3:23 PM, David G. Johnston
<david.g.johnston@gmail.com> wrote:
> On Thu, Jun 2, 2016 at 4:11 PM, John R Pierce <pierce@hogranch.com> wrote:
>> Thanks all the below seem to do the trick.

I doubt it -- using NOT IN requires (per the SQL specification)
handling NULLs in a way that probably does not give you the answer
you want.  (NOT IN also is often much slower than the NOT EXISTS
test which will actually give you the answer you want.)

test=# create table t (id int not null primary key, ref_id int, sts
int not null default 0);
CREATE TABLE
test=# insert into t values
(1,null,0),(2,1,1),(3,null,0),(4,null,0),(5,4,1),(6,null,0),(7,6,1);
INSERT 0 7
test=# select max(id) from t where sts=0 and id not in (select ref_id from t);
 max
-----

(1 row)

test=# select max(id) from t t1 where sts = 0 and not exists (select *
from t t2 where t2.ref_id = t1.id);
 max
-----
   3
(1 row)

Note that providing minimal setup (like the above) helps in getting
good answers quickly.

>> do note, this is whats known as an 'anti-join', and these can be pretty
>> expensive on large tables.
>
> +1

*Can* be.  Proper indexing can make them very reasonable.

​Doh (me)...

Indeed, NOT IN (...) doesn't qualify as an anti-join since (for one) it cannot (I don't think) be optimized in this way as the entire contents of the IN() need to be determined.  IOW, its not really a join but just another predicate condition whose one side is a subquery.

That said, writing out a full anti-join NOT EXISTS (or, similarly, a semi-join EXISTS) clause can be a bit tedious for ad-hoc stuff while the IN() variation is a bit more succinct and, I'd venture to say, unfortunately familiar.  In can be made to work in this situation by writing the expression as IN (SELECT ref_id FROM t WHERE ref_id IS NOT NULL).

David J.

pgsql-general by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: PL/PGSQL + inserts+updates+limit - Postgres 9.3
Next
From: Merlin Moncure
Date:
Subject: Re: RowDescription via the SQL?