Re: BUG #12556: Clause IN and NOT IN buggy - Mailing list pgsql-bugs
From | David G Johnston |
---|---|
Subject | Re: BUG #12556: Clause IN and NOT IN buggy |
Date | |
Msg-id | 1421426554363-5834304.post@n5.nabble.com Whole thread Raw |
In response to | BUG #12556: Clause IN and NOT IN buggy (kevin.perais@trivia-marketing.com) |
List | pgsql-bugs |
kevin.perais wrote > The following bug has been logged on the website: > > Bug reference: 12556 > Logged by: Kevin PERAIS > Email address: > kevin.perais@ > PostgreSQL version: 9.3.5 > Operating system: Ubuntu 12.04 > Description: > > Hi everybody. > > I've been noticing several times that clauses IN and NOT IN are often > buggy. > I'll show you 2 concrete cases to illustrate that. > > > --------------------------------------------- > - 1 - IN and NOT IN not coherent with JOIN - > --------------------------------------------- > > I have 2 tables here: > > Account(id BIGINT PRIMARY KEY NOT NULL, ...) > Contact(contact_id VARCHAR(11) PRIMARY KEY NOT NULL, account_id BIGINT, > ...) > > I the contacts are linked to an account. However, I know I do not have all > the account, so I want to know how many contacts are linked to an account > and how many are not. > > Here is a set of queries: > > SELECT count(*) > FROM Account > WHERE id IS NULL; > --> 0 > > SELECT count(*) > FROM Contact > WHERE account_id IS NULL; > --> 0 So, supposedly no NULL values on the relevant ID fields...this is what is suspect to us. If these are incorrect then everything below makes sense. > SELECT count(*) > FROM Contact; > --> 257726 > > SELECT count(*) > FROM Contact > WHERE account_id IN ( > SELECT id > FROM Account > ); > --> 257726 (result very very suspect) You really need to explain your reasoning for suspecting the result. It doesn't seem unusual that every contact would have an associated account so this seems very very expected to me. In the presence of NULL this would indeed be equal to the number of rows since there would be, at minimum, a NULL match for every contact row. > SELECT count(*) > FROM Contact > WHERE account_id NOT IN ( > SELECT id > FROM Account > ); > --> 0 (coherent with previous query, but result very very suspect) Again, why is not having account-less contacts suspect? Furthermore, in the presence of NULL (which you seeming precluded with the first two queries) this would be the correct answer. > SELECT count(*) > FROM Contact > JOIN Account ON Account.id = Contact.account_id; > --> 135664 > > SELECT count(Account.id) > FROM Contact > LEFT JOIN Account ON Account.id = Contact.account_id; > --> 135664 So, yeah, this confuses me. The fact that the inner and left joins match is not suspect given the previous two queries but that there are considerably fewer result rows compared to input rows. As Tom mentioned the rules for JOIN and NULL are different than the rules/behavior of IN/NOT IN and NULL. > So I decided to take a random account_id in Contact table and run the > following queries: > > SELECT count(*) > FROM Contact > WHERE account_id = 3074054072; > --> 6 > > SELECT count(*) > FROM Account > WHERE id = 3074054072; > --> 0 > > So that proves that IN and NOT IN queries give me wrong results. In the absence of any NULL values in those two fields I would have to concur. > ------------------------------------------------- > - 2 - NOT IN clause not coherent with IN clause - > ------------------------------------------------- > > [...] > > SELECT count(*) > FROM Sample > WHERE num::INTEGER IN ( > SELECT num > FROM Ref > ); > --> 2 Not sure on this one and have spent too much time on this already...sorry > SELECT count(*) > FROM Sample > WHERE num::INTEGER NOT IN ( > SELECT num > FROM Ref > ); > --> 0 (Well that is very strange and not coherent at all with the > previous query). Since any number can match NULL (i.e., the second "1" in the above IN) there is not a single value of num that will not potentially be found in Ref > I might be able to provide a data set to test case 2 (but not case 1 as > data > are confidential). Then make some data up that still exhibits your behavior. At this point we suspect human error since this stuff has been working and stables for years. Supplying a self-contained test case will either help you figure out your own mistake, allow us to find it, or allow us to realize it is not human error and then go search for the bug. The specific error is that your data contains NULL even though you are reporting that it does not. David J. -- View this message in context: http://postgresql.nabble.com/BUG-12556-Clause-IN-and-NOT-IN-buggy-tp5834102p5834304.html Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.
pgsql-bugs by date: