Thread: BUG #15604: NOT IN condition incorrectly returns False
The following bug has been logged on the website: Bug reference: 15604 Logged by: Sergey Romanovsky Email address: sergey@romanovsky.org PostgreSQL version: 10.6 Operating system: linux Red Hat 4.8.3-9 Description: Zhijiang Li <zl256@cornell.edu> and I found the following bug described here: https://github.com/romanovsky/postgres/blob/master/README.md # Postgres bug: NOT IN condition incorrectly returns False ## Short description In case the dataset doesn't fit into available memory and Postgres has decided to use hash lookup for `NOT IN` we observe that both queries `1 IN SubSelect` and `1 NOT IN SubSelect` return empty result set. If you upgrade AWS RDS instance from `db.r4.large` to `db.r4.xlarge` you won't be able to reproduce the problem. ## Assumptions made * Examples below assume that there's no request_id=1: ```sql db=> SELECT request_id FROM postgres_not_in_bug WHERE request_id=1; request_id ------------ (0 rows) ``` * Execution plan looks like: ```sql db=> EXPLAIN SELECT 'there is no 1' AS foo WHERE 1 NOT IN (SELECT request_id FROM postgres_not_in_bug); QUERY PLAN ----------------------------------------------------------------------------------- Result (cost=2101.68..2101.69 rows=1 width=32) One-Time Filter: (NOT (hashed SubPlan 1)) SubPlan 1 -> Seq Scan on postgres_not_in_bug (cost=0.00..1791.34 rows=124134 width=6) (4 rows) ``` ## Observed behavior ```sql db=> SELECT 'there is no 1' AS foo WHERE 1 NOT IN (SELECT request_id FROM postgres_not_in_bug); foo ----- (0 rows) db=> SELECT 'there is no 1' as foo WHERE 1 IN (SELECT request_id FROM postgres_not_in_bug); foo ----- (0 rows) ``` ## Expected behavior ```sql db=> SELECT 'there is no 1' AS foo WHERE 1 NOT IN (SELECT request_id FROM postgres_not_in_bug); foo --------------- there is no 1 (1 row) db=> SELECT 'there is no 1' AS foo WHERE 1 IN (SELECT request_id FROM postgres_not_in_bug); foo ----------------- (0 rows) ``` ## How to reproduce 1. Download `postgres_not_in_bug.pg_dump` from this repo locally to `/tmp/postgres_not_in_bug.pg_dump` ```bash curl -H 'Accept: application/vnd.github.v3.raw' -o /tmp/postgres_not_in_bug.pg_dump -O -L https://raw.githubusercontent.com/romanovsky/postgres/master/postgres_not_in_bug.pg_dump ``` 2. Load the dataset to postgres ```bash $ cat /tmp/postgres_not_in_bug.pg_dump|psql -h host -U user -d db --port=5432 SET SET SET SET SET set_config ------------ (1 row) SET SET SET SET SET CREATE TABLE COPY 124134 ``` 3. Run queries: ```bash $ psql -h host -U user -d db --port=5432 --command="SELECT 'there is no 1' AS foo WHERE 1 NOT IN (SELECT request_id FROM postgres_not_in_bug)" foo ----- (0 rows) $ psql -h host -U user -d db --port=5432 --command="SELECT 'there is no 1' AS foo WHERE 1 IN (SELECT request_id FROM postgres_not_in_bug)" foo ----- (0 rows) ``` ### Specifications * Postgres 10.6 ```bash $ psql -h host -U user -d db --port=5432 --command="SELECT version()" version -------------------------------------------------------------------------------------------------------- PostgreSQL 10.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit (1 row) ``` * Amazon RDS db.r4.large, i.e. 2 vCPU/15.25GB RAM (see more details here: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.DBInstanceClass.html#Concepts.DBInstanceClass.Summary)
>>>>> "PG" == PG Bug reporting form <noreply@postgresql.org> writes: PG> The following bug has been logged on the website: PG> Bug reference: 15604 PG> Logged by: Sergey Romanovsky PG> Email address: sergey@romanovsky.org PG> PostgreSQL version: 10.6 PG> Operating system: linux Red Hat 4.8.3-9 PG> Description: PG> Zhijiang Li <zl256@cornell.edu> and I found the following bug described PG> here: https://github.com/romanovsky/postgres/blob/master/README.md PG> # Postgres bug: NOT IN condition incorrectly returns False Not a bug. This is actually how NOT IN is supposed to work, and it has nothing to do with hash lookups or instance size (the output is the same whether a hashed or non-hashed plan is used). Here is why: select count(*) from postgres_not_in_bug where request_id is null; count ------- 1 (1 row) The condition 1 NOT IN (2,NULL) is equivalent to (1=2) OR (1=NULL), which evaluates to (false) OR (NULL) which in turn evaluates to NULL. Since this is not TRUE, the WHERE clause does not accept the row. When you do NOT IN (select col ...) then the null handling is the same; if there is _any_ null value in the selected data, then the NOT IN will never return TRUE (only FALSE or NULL according to whether the value is found or not). See also https://wiki.postgresql.org/wiki/Don't_Do_This#Don.27t_use_NOT_IN -- Andrew (irc:RhodiumToad)
>>>>> "Andrew" == Andrew Gierth <andrew@tao11.riddles.org.uk> writes: Andrew> The condition 1 NOT IN (2,NULL) is equivalent to (1=2) OR Andrew> (1=NULL), I mean NOT ((1=2) OR (1=NULL)) of course, but NOT (NULL) is still NULL. -- Andrew (irc:RhodiumToad)
Hi Andrew,
Thanks a lot for fast and informative response!
You're totally right.
We both feel embarrassed and amused at the same time. This is so nice of you to educate us.
On Tue, Jan 22, 2019 at 10:57 PM Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
>>>>> "PG" == PG Bug reporting form <noreply@postgresql.org> writes:
PG> The following bug has been logged on the website:
PG> Bug reference: 15604
PG> Logged by: Sergey Romanovsky
PG> Email address: sergey@romanovsky.org
PG> PostgreSQL version: 10.6
PG> Operating system: linux Red Hat 4.8.3-9
PG> Description:
PG> Zhijiang Li <zl256@cornell.edu> and I found the following bug described
PG> here: https://github.com/romanovsky/postgres/blob/master/README.md
PG> # Postgres bug: NOT IN condition incorrectly returns False
Not a bug. This is actually how NOT IN is supposed to work, and it has
nothing to do with hash lookups or instance size (the output is the same
whether a hashed or non-hashed plan is used).
Here is why:
select count(*) from postgres_not_in_bug where request_id is null;
count
-------
1
(1 row)
The condition 1 NOT IN (2,NULL) is equivalent to (1=2) OR (1=NULL),
which evaluates to (false) OR (NULL) which in turn evaluates to NULL.
Since this is not TRUE, the WHERE clause does not accept the row.
When you do NOT IN (select col ...) then the null handling is the same;
if there is _any_ null value in the selected data, then the NOT IN will
never return TRUE (only FALSE or NULL according to whether the value is
found or not).
See also https://wiki.postgresql.org/wiki/Don't_Do_This#Don.27t_use_NOT_IN
--
Andrew (irc:RhodiumToad)