Thread: Select .... where id not in (....) returns 0 incorrectly

Select .... where id not in (....) returns 0 incorrectly

From

"J. Roeleveld"

Date:

04 April 2022, 12:08:27

Hi all,

While trying to write some queries to clean up records I no longer need, I 
come up with a very strange situation where numbers literally don't add up as 
expected.

If anyone can help me trace where this strange behaviour is coming from and 
how to resolve it, please let me know.
In case this is already resolved in a recent version, I am currently using 
version 11.14.

Please see the following log from psql:

joost=> create temporary table q ( id integer );
CREATE TABLE
joost=> insert into q ( id ) select snapshotlistid from backupitem;
INSERT 0 765
joost=> insert into q ( id ) select snapshotlistid from queue;
INSERT 0 3183

joost=> select count(1) from q;
 count 
-------
  3948
(1 row)

joost=> select count(1) from snapshotlist where id in (select id from q);
 count 
-------
  1810
(1 row)

joost=> select count(1) from snapshotlist where id not in (select id from q);
 count 
-------
     0
(1 row)

joost=> select count(1) from snapshotlist where id not in (select id from 
snapshotlist where id in (select id from q));
  count  
---------
 2293923
(1 row)


The tables are defined like: (Note, I did remove some fields from the tables 
which have no impact. Most are foreign keys to further tables or varchar data 
fields)

CREATE TABLE snapshotlist (
  id SERIAL PRIMARY KEY,
  active boolean,
  created TIMESTAMP DEFAULT clock_timestamp(),
  modified TIMESTAMP
);

CREATE TABLE queue (
  id SERIAL PRIMARY KEY,
  queuetask VARCHAR(500) NOT NULL,
  snapshotlistid INTEGER REFERENCES snapshotlist(id) ON DELETE RESTRICT,
  uuid uuid NOT NULL,
  UNIQUE(uuid)
);

CREATE TABLE backupitem (
  id SERIAL PRIMARY KEY,
  snapshotlistid INTEGER REFERENCES snapshotlist(id) ON DELETE RESTRICT NOT 
NULL,
  UNIQUE(snapshotlistid)
);

Re: Select .... where id not in (....) returns 0 incorrectly

From

Ravi Krishna

Date:

04 April 2022, 12:44:15

> select count(1) from snapshotlist where id not in (select id from q);
> count
> -------
>     0
> (1 row)

Doesn't this usually happen if q.id contains NULL.  That is as per ANSI 
standard.

Re: Select .... where id not in (....) returns 0 incorrectly

From

Jeremy Smith

Date:

04 April 2022, 12:50:44

Doesn't this usually happen if q.id contains NULL. That is as per ANSI
standard.

Yes, there's a good description of this here:

https://wiki.postgresql.org/wiki/Don't_Do_This#Don.27t_use_NOT_IN

It would be better to use NOT EXISTS:

select count(*) from snapshotlist where NOT EXISTS (SELECT FROM q WHERE q.id = snapshotlist.id);

-Jeremy

Re: Select .... where id not in (....) returns 0 incorrectly

From

"J. Roeleveld"

Date:

04 April 2022, 13:21:07

On Monday, April 4, 2022 2:50:44 PM CEST Jeremy Smith wrote:
> > Doesn't this usually happen if q.id contains NULL.  That is as per ANSI
> > standard.
> 
> Yes, there's a good description of this here:
> https://wiki.postgresql.org/wiki/Don't_Do_This#Don.27t_use_NOT_IN
> 
> It would be better to use NOT EXISTS:
> select count(*) from snapshotlist where NOT EXISTS (SELECT FROM q WHERE q.id
> = snapshotlist.id);
> 
> 
> -Jeremy

Thank you Jeremy and Ravi.

This was actually the case, I forgot there is 1 NULL-value in that list.
Personally, I think NULL should be treated as a seperate value and not lead to 
strange behaviour.

I will need to look into my queries and get rid of "NOT IN" constructions when 
the list comes from a different query.

--
Joost

Re: Select .... where id not in (....) returns 0 incorrectly

From

David Rowley

Date:

04 April 2022, 20:47:51

On Tue, 5 Apr 2022 at 01:21, J. Roeleveld <joost@antarean.org> wrote:
> Personally, I think NULL should be treated as a seperate value and not lead to
> strange behaviour.

I think the rationale behind IN and NOT IN are that c IN(1,2,3) is
equivalent of writing: c = 1 OR c = 2 OR c = 3, whereas NOT IN(1,2,3)
would be the same as c <> 1 AND c <> 2 AND c <> 3.  You can imagine
what would happen in the latter case if you replaced 3 with NULL. "c
<> NULL" is NULL therefore, due to the quals being ANDed, will cause
the WHERE clause not to match anything.

In any case, it's what the SQL standard says, so that's the way we do it.

David

Re: Select .... where id not in (....) returns 0 incorrectly

From

"J. Roeleveld"

Date:

05 April 2022, 05:37:23

On Monday, April 4, 2022 10:47:51 PM CEST David Rowley wrote:
> On Tue, 5 Apr 2022 at 01:21, J. Roeleveld <joost@antarean.org> wrote:
> > Personally, I think NULL should be treated as a seperate value and not
> > lead to strange behaviour.
> 
> I think the rationale behind IN and NOT IN are that c IN(1,2,3) is
> equivalent of writing: c = 1 OR c = 2 OR c = 3, whereas NOT IN(1,2,3)
> would be the same as c <> 1 AND c <> 2 AND c <> 3.  You can imagine
> what would happen in the latter case if you replaced 3 with NULL. "c
> <> NULL" is NULL therefore, due to the quals being ANDed, will cause
> the WHERE clause not to match anything.
> 
> In any case, it's what the SQL standard says, so that's the way we do it.

I agree with following the standard.

If I would feel really strongly about this (I don't), it would be up to me to 
try and convince others.
And I have got better things to do with my time. :)

--
Joost

Re: Select .... where id not in (....) returns 0 incorrectly

From

Mladen Gogala

Date:

05 April 2022, 23:25:24

On 4/4/22 09:21, J. Roeleveld wrote:

This was actually the case, I forgot there is 1 NULL-value in that list.
Personally, I think NULL should be treated as a seperate value and not lead to 
strange behaviour.

NULL is strange. Relational databases use ternary, not binary logic. In the woke vernacular, one could say that Postgres is non-binary. NULL literally means "no value". It is a part of the standard, so we have to deal with it, Codd help us. However, based on my lifelong experience with Oracle, NULL values are bad and are best avoided. Postgres is more forgiving than Oracle because in Postgres, the condition "is not null" can be resolved by index. In Oracle, it can not.

-- 
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com

Re: Select .... where id not in (....) returns 0 incorrectly

From

"Peter J. Holzer"

Date:

06 April 2022, 06:19:57

On 2022-04-05 19:25:24 -0400, Mladen Gogala wrote:
> NULL is strange. Relational databases use ternary, not binary logic.
> In the woke vernacular, one could say that Postgres is non-binary.
> NULL literally means "no value".

I prefer to think of NULL as "unknown value". That way the ternary logic
makes intuitive sense:

NULL = NULL? If you have two unknown values you don't know whether they
are the same or not, so the result is also unknown, i.e. NULL.

> It is a part of the standard, so we have to deal with it,
> Codd help us.

:-)

> However, based on my lifelong experience with Oracle, NULL values are
> bad and are best avoided.

Oracle's handling of NULL values has a few extra warts, yes. I still
wouldn't go as far as recommending to avoid NULL values (where they make
sense semantically).

> Postgres is more forgiving than Oracle because in Postgres, the
> condition "is not null" can be resolved by index. In Oracle, it can
> not.

Actually it can (although it's a full index index scan, so the optimizer
may prefer not to). It's "is null" which cannot use an index, because
btree indexes in Oracle don't store NULL values (bitmap indexes do store
NULL values, though - are they still an enterprise feature?).

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Attachment

signature.asc