Thread: BUG #12556: Clause IN and NOT IN buggy

BUG #12556: Clause IN and NOT IN buggy

From
kevin.perais@trivia-marketing.com
Date:
The following bug has been logged on the website:

Bug reference:      12556
Logged by:          Kevin PERAIS
Email address:      kevin.perais@trivia-marketing.com
PostgreSQL version: 9.3.5
Operating system:   Ubuntu 12.04
Description:

Hi everybody.

I've been noticing several times that clauses IN and NOT IN are often buggy.
I'll show you 2 concrete cases to illustrate that.


---------------------------------------------
- 1 - IN and NOT IN not coherent with JOIN  -
---------------------------------------------

I have 2 tables here:

Account(id BIGINT PRIMARY KEY NOT NULL, ...)
Contact(contact_id VARCHAR(11) PRIMARY KEY NOT NULL, account_id BIGINT,
...)

I the contacts are linked to an account. However, I know I do not have all
the account, so I want to know how many contacts are linked to an account
and how many are not.

Here is a set of queries:

SELECT count(*)
FROM Account
WHERE id IS NULL;
--> 0

SELECT count(*)
FROM Contact
WHERE account_id IS NULL;
--> 0

SELECT count(*)
FROM Contact;
--> 257726

SELECT count(*)
FROM Contact
WHERE account_id IN (
  SELECT id
  FROM Account
);
--> 257726    (result very very suspect)

SELECT count(*)
FROM Contact
WHERE account_id NOT IN (
  SELECT id
  FROM Account
);
--> 0        (coherent with previous query, but result very very suspect)

SELECT count(*)
FROM Contact
JOIN Account ON Account.id = Contact.account_id;
--> 135664

SELECT count(Account.id)
FROM Contact
LEFT JOIN Account ON Account.id = Contact.account_id;
--> 135664

So I decided to take a random account_id in Contact table and run the
following queries:

SELECT count(*)
FROM Contact
WHERE account_id = 3074054072;
--> 6

SELECT count(*)
FROM Account
WHERE id = 3074054072;
--> 0

So that proves that IN and NOT IN queries give me wrong results.



-------------------------------------------------
- 2 - NOT IN clause not coherent with IN clause -
-------------------------------------------------

I have 2 tables:

Ref(num INTEGER UNIQUE, ...)
Sample(num VARCHAR(9), ...)

Here is a set of queries:

SELECT count(*)
FROM Sample;
--> 692

SELECT count(DISTINCT num)
FROM Sample;
--> 673

SELECT count(*)
FROM Sample
WHERE num IS NULL;
--> 19

--> 673 + 19 = 692
--> So we now know there are 673 distinct num and 19 NULL in the Sample
table.

SELECT count(*)
FROM Ref;
--> 8232

SELECT count(*)
FROM Ref
WHERE num IS NULL;
--> 151

SELECT count(*)
FROM Sample
WHERE num::INTEGER IN (
  SELECT num
  FROM Ref
);
--> 2

SELECT count(*)
FROM Sample
WHERE num::INTEGER NOT IN (
  SELECT num
  FROM Ref
);
--> 0      (Well that is very strange and not coherent at all with the
previous query).


I might be able to provide a data set to test case 2 (but not case 1 as data
are confidential).

Hope I gave enough info on the bug.
Kevin

Re: BUG #12556: Clause IN and NOT IN buggy

From
Tom Lane
Date:
kevin.perais@trivia-marketing.com writes:
> I've been noticing several times that clauses IN and NOT IN are often buggy.

None of your examples demonstrate any such thing.  What's much more likely
is that you've forgotten about NOT IN's weird (but spec-mandated) behavior
with NULLs, and/or misspelled a field name so that the output of the
sub-SELECT is actually an outer reference.

> I'll show you 2 concrete cases to illustrate that.

My idea of a "concrete case" would be something that someone else could
reproduce from the given information, which would certainly require test
data.

            regards, tom lane

Re: BUG #12556: Clause IN and NOT IN buggy

From
Kevin Perais
Date:
Hi,

I must disagree with you.

There is enough info to understand what goes wrong. I've run enough queries=
 to inspect data.

The fact that the query with IN does not returns the same result as the JOI=
N knowing that there are NULL values proves it. Please check the results of=
 all the queries together. You'll see that you do not need data and that th=
e results are incoherent! For the 1st case, I even take a value randomly an=
d I didn't find it in the Account table. So the NOT IN query just can't ret=
urn 0 as there is at least one value that is not present in the Account tab=
le ;)

-----Message d'origine-----
De=A0: Tom Lane [mailto:tgl@sss.pgh.pa.us]=20
Envoy=E9=A0: jeudi 15 janvier 2015 18:05
=C0=A0: Kevin Perais
Cc=A0: pgsql-bugs@postgresql.org
Objet=A0: Re: [BUGS] BUG #12556: Clause IN and NOT IN buggy

kevin.perais@trivia-marketing.com writes:
> I've been noticing several times that clauses IN and NOT IN are often bug=
gy.

None of your examples demonstrate any such thing.  What's much more likely
is that you've forgotten about NOT IN's weird (but spec-mandated) behavior
with NULLs, and/or misspelled a field name so that the output of the
sub-SELECT is actually an outer reference.

> I'll show you 2 concrete cases to illustrate that.

My idea of a "concrete case" would be something that someone else could
reproduce from the given information, which would certainly require test
data.

            regards, tom lane

Re: BUG #12556: Clause IN and NOT IN buggy

From
Andres Freund
Date:
On 2015-01-16 09:17:43 +0000, Kevin Perais wrote:
> There is enough info to understand what goes wrong. I've run enough queries to inspect data.

*You* want something. The likelihood of getting something fixed is far
larger if you present an example that we can actually run. We obviously
haven't seen the problem ourselves so far, so a testcase is crucial.

We don't even have the actual table definitions, so we really can't say
much. We really need a SQL script that allows us to reproduce these
cases.

> The fact that the query with IN does not returns the same result as
> the JOIN knowing that there are NULL values proves it.

I guess you mean 'no NULL values'? The second problem is perfectly
explained by Tom's remark about NOT IN(...) returning NULL if *any* of
the contained values are NULL. E.g. SELECT 1 WHERE 1 NOT IN (1, NULL);
won't return any rows.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: BUG #12556: Clause IN and NOT IN buggy

From
David G Johnston
Date:
kevin.perais wrote
> The following bug has been logged on the website:
>
> Bug reference:      12556
> Logged by:          Kevin PERAIS
> Email address:

> kevin.perais@

> PostgreSQL version: 9.3.5
> Operating system:   Ubuntu 12.04
> Description:
>
> Hi everybody.
>
> I've been noticing several times that clauses IN and NOT IN are often
> buggy.
> I'll show you 2 concrete cases to illustrate that.
>
>
> ---------------------------------------------
> - 1 - IN and NOT IN not coherent with JOIN  -
> ---------------------------------------------
>
> I have 2 tables here:
>
> Account(id BIGINT PRIMARY KEY NOT NULL, ...)
> Contact(contact_id VARCHAR(11) PRIMARY KEY NOT NULL, account_id BIGINT,
> ...)
>
> I the contacts are linked to an account. However, I know I do not have all
> the account, so I want to know how many contacts are linked to an account
> and how many are not.
>
> Here is a set of queries:
>
> SELECT count(*)
> FROM Account
> WHERE id IS NULL;
> --> 0
>
> SELECT count(*)
> FROM Contact
> WHERE account_id IS NULL;
> --> 0

So, supposedly no NULL values on the relevant ID fields...this is what is
suspect to us.  If these are incorrect then everything below makes sense.


> SELECT count(*)
> FROM Contact;
> --> 257726
>
> SELECT count(*)
> FROM Contact
> WHERE account_id IN (
>   SELECT id
>   FROM Account
> );
> --> 257726    (result very very suspect)

You really need to explain your reasoning for suspecting the result.  It
doesn't seem unusual that every contact would have an associated account so
this seems very very expected to me.

In the presence of NULL this would indeed be equal to the number of rows
since there would be, at minimum, a NULL match for every contact row.


> SELECT count(*)
> FROM Contact
> WHERE account_id NOT IN (
>   SELECT id
>   FROM Account
> );
> --> 0        (coherent with previous query, but result very very suspect)

Again, why is not having account-less contacts suspect?  Furthermore, in the
presence of NULL (which you seeming precluded with the first two queries)
this would be the correct answer.


> SELECT count(*)
> FROM Contact
> JOIN Account ON Account.id = Contact.account_id;
> --> 135664
>
> SELECT count(Account.id)
> FROM Contact
> LEFT JOIN Account ON Account.id = Contact.account_id;
> --> 135664

So, yeah, this confuses me.  The fact that the inner and left joins match is
not suspect given the previous two queries but that there are considerably
fewer result rows compared to input rows.

As Tom mentioned the rules for JOIN and NULL are different than the
rules/behavior of IN/NOT IN and NULL.


> So I decided to take a random account_id in Contact table and run the
> following queries:
>
> SELECT count(*)
> FROM Contact
> WHERE account_id = 3074054072;
> --> 6
>
> SELECT count(*)
> FROM Account
> WHERE id = 3074054072;
> --> 0
>
> So that proves that IN and NOT IN queries give me wrong results.

In the absence of any NULL values in those two fields I would have to
concur.


> -------------------------------------------------
> - 2 - NOT IN clause not coherent with IN clause -
> -------------------------------------------------
>
> [...]
>
> SELECT count(*)
> FROM Sample
> WHERE num::INTEGER IN (
>   SELECT num
>   FROM Ref
> );
> --> 2

Not sure on this one and have spent too much time on this already...sorry


> SELECT count(*)
> FROM Sample
> WHERE num::INTEGER NOT IN (
>   SELECT num
>   FROM Ref
> );
> --> 0      (Well that is very strange and not coherent at all with the
> previous query).

Since any number can match NULL (i.e., the second "1" in the above IN) there
is not a single value of num that will not potentially be found in Ref


> I might be able to provide a data set to test case 2 (but not case 1 as
> data
> are confidential).

Then make some data up that still exhibits your behavior.

At this point we suspect human error since this stuff has been working and
stables for years.  Supplying a self-contained test case will either help
you figure out your own mistake, allow us to find it, or allow us to realize
it is not human error and then go search for the bug.

The specific error is that your data contains NULL even though you are
reporting that it does not.

David J.



--
View this message in context: http://postgresql.nabble.com/BUG-12556-Clause-IN-and-NOT-IN-buggy-tp5834102p5834304.html
Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.

Re: BUG #12556: Clause IN and NOT IN buggy

From
Mike Porter
Date:
On Fri, 16 Jan 2015, Andres Freund wrote:

> On 2015-01-16 09:17:43 +0000, Kevin Perais wrote:
>> There is enough info to understand what goes wrong. I've run enough queries to inspect data.
>
> *You* want something. The likelihood of getting something fixed is far
> larger if you present an example that we can actually run. We obviously
> haven't seen the problem ourselves so far, so a testcase is crucial.
>
> We don't even have the actual table definitions, so we really can't say
> much. We really need a SQL script that allows us to reproduce these
> cases.
>
>> The fact that the query with IN does not returns the same result as
>> the JOIN knowing that there are NULL values proves it.
>
> I guess you mean 'no NULL values'? The second problem is perfectly
> explained by Tom's remark about NOT IN(...) returning NULL if *any* of
> the contained values are NULL. E.g. SELECT 1 WHERE 1 NOT IN (1, NULL);
> won't return any rows.

To the original poster:

Perhaps this example makes the correct behavior of postgres more
obvious:

net=# SELECT 1 WHERE 1 NOT IN (2);
  ?column?
----------
         1
(1 row)

net=# SELECT 1 WHERE 1 NOT IN (2, NULL);
  ?column?
----------
(0 rows)

(We can't say 1 is NOT IN (2, NULL) because the NULL value could be a
1.  We don't know what a NULL value is.  That's what NULL means.

Mike

>
> Greetings,
>
> Andres Freund
>
> --
> Andres Freund                       http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>
>
> --
> Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-bugs
>

-
Mike Porter
PGP Fingerprint: F4 AE E1 9F 67 F7 DA EA  2F D2 37 F3 99 ED D1 C2