Hi everyone,
Consider the two queries below:
```sql
CREATE TABLE t0(c0 INT);
EXPLAIN SELECT * FROM t0 WHERE t0.c0 IN (t0.c0); --2537
QUERY PLAN
------------------------------------------------------
Seq Scan on t0 (cost=0.00..35.50 rows=2537 width=4)
Filter: (c0 IS NOT NULL)
(2 rows)
EXPLAIN SELECT * FROM t0 WHERE (t0.c0 IN (t0.c0))OR(t0.c0 > 0); --858
QUERY PLAN
-----------------------------------------------------
Seq Scan on t0 (cost=0.00..48.25 rows=858 width=4)
Filter: ((c0 = c0) OR (c0 > 0))
(2 rows)
```
Based on my understanding, the second query should logically return a superset of the results obtained from the first query. However, when examining the query plans, I noticed that the estimated number of rows returned in the second query is significantly lower than that in the first query (2537 vs. 858, respectively). This issue can also be reproduced on a table with data after executing `ANALYZE`, but with a smaller estimate. I think this is because in the first query, `t0.c0 IN (t0.c0)` is optimized to a `NULL` check, while the optimization is not performed in the subexpression of the subquery. Is this expected, or is this a missed optimization that could be addressed?
I used this version of PostgreSQL:
Commit: 503b0556d96f2c8df6ed91c5a8cf11b23f37ce6d
Best regards,
Jinsheng Ba