Suspicious Estimated Number of Returned Rows - Mailing list pgsql-bugs

From Ba Jinsheng
Subject Suspicious Estimated Number of Returned Rows
Date
Msg-id SG2PR06MB2810355511BEA44B808B32178A4B9@SG2PR06MB2810.apcprd06.prod.outlook.com
Whole thread Raw
List pgsql-bugs

Hi everyone,

 

Consider the two queries below:

 

```sql

CREATE TABLE t0(c0 INT);

EXPLAIN SELECT * FROM t0 WHERE t0.c0 IN (t0.c0); --2537

                      QUERY PLAN                     

------------------------------------------------------

Seq Scan on t0  (cost=0.00..35.50 rows=2537 width=4)

   Filter: (c0 IS NOT NULL)

(2 rows)

 

EXPLAIN SELECT * FROM t0 WHERE (t0.c0 IN (t0.c0))OR(t0.c0 > 0); --858

                     QUERY PLAN                     

-----------------------------------------------------

Seq Scan on t0  (cost=0.00..48.25 rows=858 width=4)

   Filter: ((c0 = c0) OR (c0 > 0))

(2 rows)

```

 

Based on my understanding, the second query should logically return a superset of the results obtained from the first query. However, when examining the query plans, I noticed that the estimated number of rows returned in the second query is significantly lower than that in the first query (2537 vs. 858, respectively). This issue can also be reproduced on a table with data after executing `ANALYZE`, but with a smaller estimate. I think this is because in the first query, `t0.c0 IN (t0.c0)` is optimized to a `NULL` check, while the optimization is not performed in the subexpression of the subquery. Is this expected, or is this a missed optimization that could be addressed?

 

I used this version of PostgreSQL:

Commit: 503b0556d96f2c8df6ed91c5a8cf11b23f37ce6d

 

 

 

 

 

Best regards,

Jinsheng Ba

 

pgsql-bugs by date:

Previous
From: Thomas Munro
Date:
Subject: Re: BUG #17949: Adding an index introduces serialisation anomalies.
Next
From: Jaime Casanova
Date:
Subject: Re: Server crash with parallel workers with Postgres 14.7