The reason I submitted it as a bug is, I feel it's a wrong output by the planner: the nested loop node picks up the incorrect row estimate from the HashAggregate node as the multiplier. It's a very minor issue though, which causes slowness at worst.
On Wed, 15 May 2024 at 21:07, PG Bug reporting form <noreply@postgresql.org> wrote: > Hello everyone, I notice the row estimate for a nested loop is wrong if a > cte is involved.
The yet-to-be-released PostgreSQL 17 should have some code which improves this [1].
With your test case on that version, I see:
postgres=# explain analyze with my_cte as materialized (select b from t2) select * postgres-# from t1 where t1.a in (select b from my_cte); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=37.92..469.00 rows=296 width=4) (actual time=0.574..0.893 rows=300 loops=1) CTE my_cte -> Seq Scan on t2 (cost=0.00..15.00 rows=1000 width=4) (actual time=0.015..0.079 rows=1000 loops=1) -> HashAggregate (cost=22.50..23.50 rows=100 width=4) (actual time=0.394..0.406 rows=100 loops=1) Group Key: my_cte.b Batches: 1 Memory Usage: 24kB -> CTE Scan on my_cte (cost=0.00..20.00 rows=1000 width=4) (actual time=0.025..0.229 rows=1000 loops=1) -> Index Only Scan using my_index on t1 (cost=0.42..4.27 rows=3 width=4) (actual time=0.004..0.004 rows=3 loops=100) Index Cond: (a = my_cte.b) Heap Fetches: 0
There are no plans to backpatched this improvement into PostgreSQL 16. It's not really classed as a bug, just something that could have been improved... which is now is.