Thread: BUG #12556: Clause IN and NOT IN buggy
The following bug has been logged on the website: Bug reference: 12556 Logged by: Kevin PERAIS Email address: kevin.perais@trivia-marketing.com PostgreSQL version: 9.3.5 Operating system: Ubuntu 12.04 Description: Hi everybody. I've been noticing several times that clauses IN and NOT IN are often buggy. I'll show you 2 concrete cases to illustrate that. --------------------------------------------- - 1 - IN and NOT IN not coherent with JOIN - --------------------------------------------- I have 2 tables here: Account(id BIGINT PRIMARY KEY NOT NULL, ...) Contact(contact_id VARCHAR(11) PRIMARY KEY NOT NULL, account_id BIGINT, ...) I the contacts are linked to an account. However, I know I do not have all the account, so I want to know how many contacts are linked to an account and how many are not. Here is a set of queries: SELECT count(*) FROM Account WHERE id IS NULL; --> 0 SELECT count(*) FROM Contact WHERE account_id IS NULL; --> 0 SELECT count(*) FROM Contact; --> 257726 SELECT count(*) FROM Contact WHERE account_id IN ( SELECT id FROM Account ); --> 257726 (result very very suspect) SELECT count(*) FROM Contact WHERE account_id NOT IN ( SELECT id FROM Account ); --> 0 (coherent with previous query, but result very very suspect) SELECT count(*) FROM Contact JOIN Account ON Account.id = Contact.account_id; --> 135664 SELECT count(Account.id) FROM Contact LEFT JOIN Account ON Account.id = Contact.account_id; --> 135664 So I decided to take a random account_id in Contact table and run the following queries: SELECT count(*) FROM Contact WHERE account_id = 3074054072; --> 6 SELECT count(*) FROM Account WHERE id = 3074054072; --> 0 So that proves that IN and NOT IN queries give me wrong results. ------------------------------------------------- - 2 - NOT IN clause not coherent with IN clause - ------------------------------------------------- I have 2 tables: Ref(num INTEGER UNIQUE, ...) Sample(num VARCHAR(9), ...) Here is a set of queries: SELECT count(*) FROM Sample; --> 692 SELECT count(DISTINCT num) FROM Sample; --> 673 SELECT count(*) FROM Sample WHERE num IS NULL; --> 19 --> 673 + 19 = 692 --> So we now know there are 673 distinct num and 19 NULL in the Sample table. SELECT count(*) FROM Ref; --> 8232 SELECT count(*) FROM Ref WHERE num IS NULL; --> 151 SELECT count(*) FROM Sample WHERE num::INTEGER IN ( SELECT num FROM Ref ); --> 2 SELECT count(*) FROM Sample WHERE num::INTEGER NOT IN ( SELECT num FROM Ref ); --> 0 (Well that is very strange and not coherent at all with the previous query). I might be able to provide a data set to test case 2 (but not case 1 as data are confidential). Hope I gave enough info on the bug. Kevin
kevin.perais@trivia-marketing.com writes: > I've been noticing several times that clauses IN and NOT IN are often buggy. None of your examples demonstrate any such thing. What's much more likely is that you've forgotten about NOT IN's weird (but spec-mandated) behavior with NULLs, and/or misspelled a field name so that the output of the sub-SELECT is actually an outer reference. > I'll show you 2 concrete cases to illustrate that. My idea of a "concrete case" would be something that someone else could reproduce from the given information, which would certainly require test data. regards, tom lane
Hi, I must disagree with you. There is enough info to understand what goes wrong. I've run enough queries= to inspect data. The fact that the query with IN does not returns the same result as the JOI= N knowing that there are NULL values proves it. Please check the results of= all the queries together. You'll see that you do not need data and that th= e results are incoherent! For the 1st case, I even take a value randomly an= d I didn't find it in the Account table. So the NOT IN query just can't ret= urn 0 as there is at least one value that is not present in the Account tab= le ;) -----Message d'origine----- De=A0: Tom Lane [mailto:tgl@sss.pgh.pa.us]=20 Envoy=E9=A0: jeudi 15 janvier 2015 18:05 =C0=A0: Kevin Perais Cc=A0: pgsql-bugs@postgresql.org Objet=A0: Re: [BUGS] BUG #12556: Clause IN and NOT IN buggy kevin.perais@trivia-marketing.com writes: > I've been noticing several times that clauses IN and NOT IN are often bug= gy. None of your examples demonstrate any such thing. What's much more likely is that you've forgotten about NOT IN's weird (but spec-mandated) behavior with NULLs, and/or misspelled a field name so that the output of the sub-SELECT is actually an outer reference. > I'll show you 2 concrete cases to illustrate that. My idea of a "concrete case" would be something that someone else could reproduce from the given information, which would certainly require test data. regards, tom lane
On 2015-01-16 09:17:43 +0000, Kevin Perais wrote: > There is enough info to understand what goes wrong. I've run enough queries to inspect data. *You* want something. The likelihood of getting something fixed is far larger if you present an example that we can actually run. We obviously haven't seen the problem ourselves so far, so a testcase is crucial. We don't even have the actual table definitions, so we really can't say much. We really need a SQL script that allows us to reproduce these cases. > The fact that the query with IN does not returns the same result as > the JOIN knowing that there are NULL values proves it. I guess you mean 'no NULL values'? The second problem is perfectly explained by Tom's remark about NOT IN(...) returning NULL if *any* of the contained values are NULL. E.g. SELECT 1 WHERE 1 NOT IN (1, NULL); won't return any rows. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
kevin.perais wrote > The following bug has been logged on the website: > > Bug reference: 12556 > Logged by: Kevin PERAIS > Email address: > kevin.perais@ > PostgreSQL version: 9.3.5 > Operating system: Ubuntu 12.04 > Description: > > Hi everybody. > > I've been noticing several times that clauses IN and NOT IN are often > buggy. > I'll show you 2 concrete cases to illustrate that. > > > --------------------------------------------- > - 1 - IN and NOT IN not coherent with JOIN - > --------------------------------------------- > > I have 2 tables here: > > Account(id BIGINT PRIMARY KEY NOT NULL, ...) > Contact(contact_id VARCHAR(11) PRIMARY KEY NOT NULL, account_id BIGINT, > ...) > > I the contacts are linked to an account. However, I know I do not have all > the account, so I want to know how many contacts are linked to an account > and how many are not. > > Here is a set of queries: > > SELECT count(*) > FROM Account > WHERE id IS NULL; > --> 0 > > SELECT count(*) > FROM Contact > WHERE account_id IS NULL; > --> 0 So, supposedly no NULL values on the relevant ID fields...this is what is suspect to us. If these are incorrect then everything below makes sense. > SELECT count(*) > FROM Contact; > --> 257726 > > SELECT count(*) > FROM Contact > WHERE account_id IN ( > SELECT id > FROM Account > ); > --> 257726 (result very very suspect) You really need to explain your reasoning for suspecting the result. It doesn't seem unusual that every contact would have an associated account so this seems very very expected to me. In the presence of NULL this would indeed be equal to the number of rows since there would be, at minimum, a NULL match for every contact row. > SELECT count(*) > FROM Contact > WHERE account_id NOT IN ( > SELECT id > FROM Account > ); > --> 0 (coherent with previous query, but result very very suspect) Again, why is not having account-less contacts suspect? Furthermore, in the presence of NULL (which you seeming precluded with the first two queries) this would be the correct answer. > SELECT count(*) > FROM Contact > JOIN Account ON Account.id = Contact.account_id; > --> 135664 > > SELECT count(Account.id) > FROM Contact > LEFT JOIN Account ON Account.id = Contact.account_id; > --> 135664 So, yeah, this confuses me. The fact that the inner and left joins match is not suspect given the previous two queries but that there are considerably fewer result rows compared to input rows. As Tom mentioned the rules for JOIN and NULL are different than the rules/behavior of IN/NOT IN and NULL. > So I decided to take a random account_id in Contact table and run the > following queries: > > SELECT count(*) > FROM Contact > WHERE account_id = 3074054072; > --> 6 > > SELECT count(*) > FROM Account > WHERE id = 3074054072; > --> 0 > > So that proves that IN and NOT IN queries give me wrong results. In the absence of any NULL values in those two fields I would have to concur. > ------------------------------------------------- > - 2 - NOT IN clause not coherent with IN clause - > ------------------------------------------------- > > [...] > > SELECT count(*) > FROM Sample > WHERE num::INTEGER IN ( > SELECT num > FROM Ref > ); > --> 2 Not sure on this one and have spent too much time on this already...sorry > SELECT count(*) > FROM Sample > WHERE num::INTEGER NOT IN ( > SELECT num > FROM Ref > ); > --> 0 (Well that is very strange and not coherent at all with the > previous query). Since any number can match NULL (i.e., the second "1" in the above IN) there is not a single value of num that will not potentially be found in Ref > I might be able to provide a data set to test case 2 (but not case 1 as > data > are confidential). Then make some data up that still exhibits your behavior. At this point we suspect human error since this stuff has been working and stables for years. Supplying a self-contained test case will either help you figure out your own mistake, allow us to find it, or allow us to realize it is not human error and then go search for the bug. The specific error is that your data contains NULL even though you are reporting that it does not. David J. -- View this message in context: http://postgresql.nabble.com/BUG-12556-Clause-IN-and-NOT-IN-buggy-tp5834102p5834304.html Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.
On Fri, 16 Jan 2015, Andres Freund wrote: > On 2015-01-16 09:17:43 +0000, Kevin Perais wrote: >> There is enough info to understand what goes wrong. I've run enough queries to inspect data. > > *You* want something. The likelihood of getting something fixed is far > larger if you present an example that we can actually run. We obviously > haven't seen the problem ourselves so far, so a testcase is crucial. > > We don't even have the actual table definitions, so we really can't say > much. We really need a SQL script that allows us to reproduce these > cases. > >> The fact that the query with IN does not returns the same result as >> the JOIN knowing that there are NULL values proves it. > > I guess you mean 'no NULL values'? The second problem is perfectly > explained by Tom's remark about NOT IN(...) returning NULL if *any* of > the contained values are NULL. E.g. SELECT 1 WHERE 1 NOT IN (1, NULL); > won't return any rows. To the original poster: Perhaps this example makes the correct behavior of postgres more obvious: net=# SELECT 1 WHERE 1 NOT IN (2); ?column? ---------- 1 (1 row) net=# SELECT 1 WHERE 1 NOT IN (2, NULL); ?column? ---------- (0 rows) (We can't say 1 is NOT IN (2, NULL) because the NULL value could be a 1. We don't know what a NULL value is. That's what NULL means. Mike > > Greetings, > > Andres Freund > > -- > Andres Freund http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services > > > -- > Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-bugs > - Mike Porter PGP Fingerprint: F4 AE E1 9F 67 F7 DA EA 2F D2 37 F3 99 ED D1 C2