Re: BUG #12556: Clause IN and NOT IN buggy - Mailing list pgsql-bugs

From David G Johnston
Subject Re: BUG #12556: Clause IN and NOT IN buggy
Date
Msg-id 1421426554363-5834304.post@n5.nabble.com
Whole thread Raw
In response to BUG #12556: Clause IN and NOT IN buggy  (kevin.perais@trivia-marketing.com)
List pgsql-bugs
kevin.perais wrote
> The following bug has been logged on the website:
>
> Bug reference:      12556
> Logged by:          Kevin PERAIS
> Email address:

> kevin.perais@

> PostgreSQL version: 9.3.5
> Operating system:   Ubuntu 12.04
> Description:
>
> Hi everybody.
>
> I've been noticing several times that clauses IN and NOT IN are often
> buggy.
> I'll show you 2 concrete cases to illustrate that.
>
>
> ---------------------------------------------
> - 1 - IN and NOT IN not coherent with JOIN  -
> ---------------------------------------------
>
> I have 2 tables here:
>
> Account(id BIGINT PRIMARY KEY NOT NULL, ...)
> Contact(contact_id VARCHAR(11) PRIMARY KEY NOT NULL, account_id BIGINT,
> ...)
>
> I the contacts are linked to an account. However, I know I do not have all
> the account, so I want to know how many contacts are linked to an account
> and how many are not.
>
> Here is a set of queries:
>
> SELECT count(*)
> FROM Account
> WHERE id IS NULL;
> --> 0
>
> SELECT count(*)
> FROM Contact
> WHERE account_id IS NULL;
> --> 0

So, supposedly no NULL values on the relevant ID fields...this is what is
suspect to us.  If these are incorrect then everything below makes sense.


> SELECT count(*)
> FROM Contact;
> --> 257726
>
> SELECT count(*)
> FROM Contact
> WHERE account_id IN (
>   SELECT id
>   FROM Account
> );
> --> 257726    (result very very suspect)

You really need to explain your reasoning for suspecting the result.  It
doesn't seem unusual that every contact would have an associated account so
this seems very very expected to me.

In the presence of NULL this would indeed be equal to the number of rows
since there would be, at minimum, a NULL match for every contact row.


> SELECT count(*)
> FROM Contact
> WHERE account_id NOT IN (
>   SELECT id
>   FROM Account
> );
> --> 0        (coherent with previous query, but result very very suspect)

Again, why is not having account-less contacts suspect?  Furthermore, in the
presence of NULL (which you seeming precluded with the first two queries)
this would be the correct answer.


> SELECT count(*)
> FROM Contact
> JOIN Account ON Account.id = Contact.account_id;
> --> 135664
>
> SELECT count(Account.id)
> FROM Contact
> LEFT JOIN Account ON Account.id = Contact.account_id;
> --> 135664

So, yeah, this confuses me.  The fact that the inner and left joins match is
not suspect given the previous two queries but that there are considerably
fewer result rows compared to input rows.

As Tom mentioned the rules for JOIN and NULL are different than the
rules/behavior of IN/NOT IN and NULL.


> So I decided to take a random account_id in Contact table and run the
> following queries:
>
> SELECT count(*)
> FROM Contact
> WHERE account_id = 3074054072;
> --> 6
>
> SELECT count(*)
> FROM Account
> WHERE id = 3074054072;
> --> 0
>
> So that proves that IN and NOT IN queries give me wrong results.

In the absence of any NULL values in those two fields I would have to
concur.


> -------------------------------------------------
> - 2 - NOT IN clause not coherent with IN clause -
> -------------------------------------------------
>
> [...]
>
> SELECT count(*)
> FROM Sample
> WHERE num::INTEGER IN (
>   SELECT num
>   FROM Ref
> );
> --> 2

Not sure on this one and have spent too much time on this already...sorry


> SELECT count(*)
> FROM Sample
> WHERE num::INTEGER NOT IN (
>   SELECT num
>   FROM Ref
> );
> --> 0      (Well that is very strange and not coherent at all with the
> previous query).

Since any number can match NULL (i.e., the second "1" in the above IN) there
is not a single value of num that will not potentially be found in Ref


> I might be able to provide a data set to test case 2 (but not case 1 as
> data
> are confidential).

Then make some data up that still exhibits your behavior.

At this point we suspect human error since this stuff has been working and
stables for years.  Supplying a self-contained test case will either help
you figure out your own mistake, allow us to find it, or allow us to realize
it is not human error and then go search for the bug.

The specific error is that your data contains NULL even though you are
reporting that it does not.

David J.



--
View this message in context: http://postgresql.nabble.com/BUG-12556-Clause-IN-and-NOT-IN-buggy-tp5834102p5834304.html
Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #12568: upper of int4range unexpected value
Next
From: Mike Porter
Date:
Subject: Re: BUG #12556: Clause IN and NOT IN buggy