Thread: Select .... where id not in (....) returns 0 incorrectly
Hi all, While trying to write some queries to clean up records I no longer need, I come up with a very strange situation where numbers literally don't add up as expected. If anyone can help me trace where this strange behaviour is coming from and how to resolve it, please let me know. In case this is already resolved in a recent version, I am currently using version 11.14. Please see the following log from psql: joost=> create temporary table q ( id integer ); CREATE TABLE joost=> insert into q ( id ) select snapshotlistid from backupitem; INSERT 0 765 joost=> insert into q ( id ) select snapshotlistid from queue; INSERT 0 3183 joost=> select count(1) from q; count ------- 3948 (1 row) joost=> select count(1) from snapshotlist where id in (select id from q); count ------- 1810 (1 row) joost=> select count(1) from snapshotlist where id not in (select id from q); count ------- 0 (1 row) joost=> select count(1) from snapshotlist where id not in (select id from snapshotlist where id in (select id from q)); count --------- 2293923 (1 row) The tables are defined like: (Note, I did remove some fields from the tables which have no impact. Most are foreign keys to further tables or varchar data fields) CREATE TABLE snapshotlist ( id SERIAL PRIMARY KEY, active boolean, created TIMESTAMP DEFAULT clock_timestamp(), modified TIMESTAMP ); CREATE TABLE queue ( id SERIAL PRIMARY KEY, queuetask VARCHAR(500) NOT NULL, snapshotlistid INTEGER REFERENCES snapshotlist(id) ON DELETE RESTRICT, uuid uuid NOT NULL, UNIQUE(uuid) ); CREATE TABLE backupitem ( id SERIAL PRIMARY KEY, snapshotlistid INTEGER REFERENCES snapshotlist(id) ON DELETE RESTRICT NOT NULL, UNIQUE(snapshotlistid) );
> select count(1) from snapshotlist where id not in (select id from q); > count > ------- > 0 > (1 row) Doesn't this usually happen if q.id contains NULL. That is as per ANSI standard.
Doesn't this usually happen if q.id contains NULL. That is as per ANSI
standard.
Yes, there's a good description of this here:
It would be better to use NOT EXISTS:
-Jeremy
On Monday, April 4, 2022 2:50:44 PM CEST Jeremy Smith wrote: > > Doesn't this usually happen if q.id contains NULL. That is as per ANSI > > standard. > > Yes, there's a good description of this here: > https://wiki.postgresql.org/wiki/Don't_Do_This#Don.27t_use_NOT_IN > > It would be better to use NOT EXISTS: > select count(*) from snapshotlist where NOT EXISTS (SELECT FROM q WHERE q.id > = snapshotlist.id); > > > -Jeremy Thank you Jeremy and Ravi. This was actually the case, I forgot there is 1 NULL-value in that list. Personally, I think NULL should be treated as a seperate value and not lead to strange behaviour. I will need to look into my queries and get rid of "NOT IN" constructions when the list comes from a different query. -- Joost
On Tue, 5 Apr 2022 at 01:21, J. Roeleveld <joost@antarean.org> wrote: > Personally, I think NULL should be treated as a seperate value and not lead to > strange behaviour. I think the rationale behind IN and NOT IN are that c IN(1,2,3) is equivalent of writing: c = 1 OR c = 2 OR c = 3, whereas NOT IN(1,2,3) would be the same as c <> 1 AND c <> 2 AND c <> 3. You can imagine what would happen in the latter case if you replaced 3 with NULL. "c <> NULL" is NULL therefore, due to the quals being ANDed, will cause the WHERE clause not to match anything. In any case, it's what the SQL standard says, so that's the way we do it. David
On Monday, April 4, 2022 10:47:51 PM CEST David Rowley wrote: > On Tue, 5 Apr 2022 at 01:21, J. Roeleveld <joost@antarean.org> wrote: > > Personally, I think NULL should be treated as a seperate value and not > > lead to strange behaviour. > > I think the rationale behind IN and NOT IN are that c IN(1,2,3) is > equivalent of writing: c = 1 OR c = 2 OR c = 3, whereas NOT IN(1,2,3) > would be the same as c <> 1 AND c <> 2 AND c <> 3. You can imagine > what would happen in the latter case if you replaced 3 with NULL. "c > <> NULL" is NULL therefore, due to the quals being ANDed, will cause > the WHERE clause not to match anything. > > In any case, it's what the SQL standard says, so that's the way we do it. I agree with following the standard. If I would feel really strongly about this (I don't), it would be up to me to try and convince others. And I have got better things to do with my time. :) -- Joost
On 4/4/22 09:21, J. Roeleveld wrote:
This was actually the case, I forgot there is 1 NULL-value in that list. Personally, I think NULL should be treated as a seperate value and not lead to strange behaviour.
NULL is strange. Relational databases use ternary, not binary logic. In the woke vernacular, one could say that Postgres is non-binary. NULL literally means "no value". It is a part of the standard, so we have to deal with it, Codd help us. However, based on my lifelong experience with Oracle, NULL values are bad and are best avoided. Postgres is more forgiving than Oracle because in Postgres, the condition "is not null" can be resolved by index. In Oracle, it can not.
-- Mladen Gogala Database Consultant Tel: (347) 321-1217 https://dbwhisperer.wordpress.com
On 2022-04-05 19:25:24 -0400, Mladen Gogala wrote: > NULL is strange. Relational databases use ternary, not binary logic. > In the woke vernacular, one could say that Postgres is non-binary. > NULL literally means "no value". I prefer to think of NULL as "unknown value". That way the ternary logic makes intuitive sense: NULL = NULL? If you have two unknown values you don't know whether they are the same or not, so the result is also unknown, i.e. NULL. > It is a part of the standard, so we have to deal with it, > Codd help us. :-) > However, based on my lifelong experience with Oracle, NULL values are > bad and are best avoided. Oracle's handling of NULL values has a few extra warts, yes. I still wouldn't go as far as recommending to avoid NULL values (where they make sense semantically). > Postgres is more forgiving than Oracle because in Postgres, the > condition "is not null" can be resolved by index. In Oracle, it can > not. Actually it can (although it's a full index index scan, so the optimizer may prefer not to). It's "is null" which cannot use an index, because btree indexes in Oracle don't store NULL values (bitmap indexes do store NULL values, though - are they still an enterprise feature?). hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"