Home > mailing lists

Suspicious Estimated Number of Returned Rows - Mailing list pgsql-bugs

From	Ba Jinsheng
Subject	Suspicious Estimated Number of Returned Rows
Date	May 30, 2023 03:19:28
Msg-id	SG2PR06MB2810355511BEA44B808B32178A4B9@SG2PR06MB2810.apcprd06.prod.outlook.com Whole thread Raw
List	pgsql-bugs

Tree view

Hi everyone,

Consider the two queries below:

```sql

CREATE TABLE t0(c0 INT);

EXPLAIN SELECT * FROM t0 WHERE t0.c0 IN (t0.c0); --2537

QUERY PLAN

------------------------------------------------------

Seq Scan on t0 (cost=0.00..35.50 rows=2537 width=4)

Filter: (c0 IS NOT NULL)

(2 rows)

EXPLAIN SELECT * FROM t0 WHERE (t0.c0 IN (t0.c0))OR(t0.c0 > 0); --858

QUERY PLAN

-----------------------------------------------------

Seq Scan on t0 (cost=0.00..48.25 rows=858 width=4)

Filter: ((c0 = c0) OR (c0 > 0))

(2 rows)

```

Based on my understanding, the second query should logically return a superset of the results obtained from the first query. However, when examining the query plans, I noticed that the estimated number of rows returned in the second query is significantly lower than that in the first query (2537 vs. 858, respectively). This issue can also be reproduced on a table with data after executing `ANALYZE`, but with a smaller estimate. I think this is because in the first query, `t0.c0 IN (t0.c0)` is optimized to a `NULL` check, while the optimization is not performed in the subexpression of the subquery. Is this expected, or is this a missed optimization that could be addressed?

I used this version of PostgreSQL:

Commit: 503b0556d96f2c8df6ed91c5a8cf11b23f37ce6d

Best regards,

Jinsheng Ba

pgsql-bugs by date:

From: Thomas Munro
Date: 30 May 2023, 01:29:09
Subject: Re: BUG #17949: Adding an index introduces serialisation anomalies.

From: Jaime Casanova
Date: 30 May 2023, 04:56:09
Subject: Re: Server crash with parallel workers with Postgres 14.7

Suspicious Estimated Number of Returned Rows - Mailing list pgsql-bugs

Previous

Next