The following bug has been logged on the website:
Bug reference: 12556
Logged by: Kevin PERAIS
Email address: kevin.perais@trivia-marketing.com
PostgreSQL version: 9.3.5
Operating system: Ubuntu 12.04
Description:
Hi everybody.
I've been noticing several times that clauses IN and NOT IN are often buggy.
I'll show you 2 concrete cases to illustrate that.
---------------------------------------------
- 1 - IN and NOT IN not coherent with JOIN -
---------------------------------------------
I have 2 tables here:
Account(id BIGINT PRIMARY KEY NOT NULL, ...)
Contact(contact_id VARCHAR(11) PRIMARY KEY NOT NULL, account_id BIGINT,
...)
I the contacts are linked to an account. However, I know I do not have all
the account, so I want to know how many contacts are linked to an account
and how many are not.
Here is a set of queries:
SELECT count(*)
FROM Account
WHERE id IS NULL;
--> 0
SELECT count(*)
FROM Contact
WHERE account_id IS NULL;
--> 0
SELECT count(*)
FROM Contact;
--> 257726
SELECT count(*)
FROM Contact
WHERE account_id IN (
SELECT id
FROM Account
);
--> 257726 (result very very suspect)
SELECT count(*)
FROM Contact
WHERE account_id NOT IN (
SELECT id
FROM Account
);
--> 0 (coherent with previous query, but result very very suspect)
SELECT count(*)
FROM Contact
JOIN Account ON Account.id = Contact.account_id;
--> 135664
SELECT count(Account.id)
FROM Contact
LEFT JOIN Account ON Account.id = Contact.account_id;
--> 135664
So I decided to take a random account_id in Contact table and run the
following queries:
SELECT count(*)
FROM Contact
WHERE account_id = 3074054072;
--> 6
SELECT count(*)
FROM Account
WHERE id = 3074054072;
--> 0
So that proves that IN and NOT IN queries give me wrong results.
-------------------------------------------------
- 2 - NOT IN clause not coherent with IN clause -
-------------------------------------------------
I have 2 tables:
Ref(num INTEGER UNIQUE, ...)
Sample(num VARCHAR(9), ...)
Here is a set of queries:
SELECT count(*)
FROM Sample;
--> 692
SELECT count(DISTINCT num)
FROM Sample;
--> 673
SELECT count(*)
FROM Sample
WHERE num IS NULL;
--> 19
--> 673 + 19 = 692
--> So we now know there are 673 distinct num and 19 NULL in the Sample
table.
SELECT count(*)
FROM Ref;
--> 8232
SELECT count(*)
FROM Ref
WHERE num IS NULL;
--> 151
SELECT count(*)
FROM Sample
WHERE num::INTEGER IN (
SELECT num
FROM Ref
);
--> 2
SELECT count(*)
FROM Sample
WHERE num::INTEGER NOT IN (
SELECT num
FROM Ref
);
--> 0 (Well that is very strange and not coherent at all with the
previous query).
I might be able to provide a data set to test case 2 (but not case 1 as data
are confidential).
Hope I gave enough info on the bug.
Kevin