Re: Optimize cardinality estimation when unique keys are fully covered - Mailing list pgsql-hackers
| From | feichanghong |
|---|---|
| Subject | Re: Optimize cardinality estimation when unique keys are fully covered |
| Date | |
| Msg-id | tencent_E033CE3A613FEE4CC0384E11F494B4DE7D09@qq.com Whole thread Raw |
| In response to | Optimize cardinality estimation when unique keys are fully covered ("feichanghong" <feichanghong@qq.com>) |
| List | pgsql-hackers |
On Nov 22, 2025, at 00:54, feichanghong <feichanghong@qq.com> wrote:Hi,Recently, I ran into a cardinality estimation problem where the querypredicates fully cover a primary key or unique index. The issue can bereproduced with the following case:```sqlcreate table t1(a int, b int, primary key(a, b));DO $$DECLAREi integer;BEGINFOR i IN 1..100000 LOOPINSERT INTO t1 VALUES (i, 0);END LOOP;END;$$ LANGUAGE plpgsql;DO $$DECLAREi integer;BEGINFOR j IN 1..10 LOOPFOR i IN 1..100000 LOOPINSERT INTO t1 VALUES (i%10+10*(j-1), i);END LOOP;END LOOP;END;$$ LANGUAGE plpgsql;create table t2(a int);insert into t2 select 1;analyze t1, t2;postgres=# explain analyze select * from t1 where a = 1 and b = 0;QUERY PLAN---------------------------------------------------------------------------------------------------------------------Bitmap Heap Scan on t1 (cost=4.65..2174.00 rows=833 width=8) (actual time=0.060..0.061 rows=1.00 loops=1)Recheck Cond: ((a = 1) AND (b = 0))Heap Blocks: exact=1Buffers: shared hit=4-> Bitmap Index Scan on t1_pkey (cost=0.00..4.44 rows=833 width=0) (actual time=0.042..0.043 rows=1.00 loops=1)Index Cond: ((a = 1) AND (b = 0))Index Searches: 1Buffers: shared hit=3Planning Time: 0.146 msExecution Time: 0.105 ms(10 rows)postgres=# explain analyze select * from t1, t2 where t1.a = t2.a and t1.b = 0;QUERY PLAN-----------------------------------------------------------------------------------------------------------------------------Nested Loop (cost=0.43..4.81 rows=32 width=12) (actual time=0.067..0.069 rows=1.00 loops=1)Buffers: shared hit=5-> Seq Scan on t2 (cost=0.00..1.01 rows=1 width=4) (actual time=0.024..0.025 rows=1.00 loops=1)Buffers: shared hit=1-> Index Only Scan using t1_pkey on t1 (cost=0.43..108.23 rows=32 width=8) (actual time=0.036..0.036 rows=1.00 loops=1)Index Cond: ((a = t2.a) AND (b = 0))Heap Fetches: 0Index Searches: 1Buffers: shared hit=4Planning Time: 0.257 msExecution Time: 0.114 ms(11 rows)```It can be observed that the Index Cond fully covers the primary key index.Naturally, the correct rows estimate should be 1, but the optimizerestimates the two conditions independently, resulting in an overestimatedrow count. This overestimation has a greater impact in scenarios withpartitioned tables and multi-table joins, making the planner more inclinedto choose hash or merge joins.We may consider checking in cardinality estimation whether the restrictlistfully covers a unique index. If so, we can directly set the estimated rowsto 1. The attached patch provides a very early demo of this approach.
Apologies for the garbled text due to my email client. Please find the
reproduction SQL below again:
```sql
create table t1(a int, b int, primary key(a, b));
DO $$
DECLARE
i integer;
BEGIN
FOR i IN 1..100000 LOOP
INSERT INTO t1 VALUES (i, 0);
END LOOP;
END;
$$ LANGUAGE plpgsql;
DO $$
DECLARE
i integer;
BEGIN
FOR j IN 1..10 LOOP
FOR i IN 1..100000 LOOP
INSERT INTO t1 VALUES (i%10+10*(j-1), i);
END LOOP;
END LOOP;
END;
$$ LANGUAGE plpgsql;
create table t2(a int);
insert into t2 select 1;
analyze t1, t2;
postgres=# explain analyze select * from t1 where a = 1 and b = 0;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on t1 (cost=4.65..2174.00 rows=833 width=8) (actual time=0.060..0.061 rows=1.00 loops=1)
Recheck Cond: ((a = 1) AND (b = 0))
Heap Blocks: exact=1
Buffers: shared hit=4
-> Bitmap Index Scan on t1_pkey (cost=0.00..4.44 rows=833 width=0) (actual time=0.042..0.043 rows=1.00 loops=1)
Index Cond: ((a = 1) AND (b = 0))
Index Searches: 1
Buffers: shared hit=3
Planning Time: 0.146 ms
Execution Time: 0.105 ms
(10 rows)
postgres=# explain analyze select * from t1, t2 where t1.a = t2.a and t1.b = 0;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.43..4.81 rows=32 width=12) (actual time=0.067..0.069 rows=1.00 loops=1)
Buffers: shared hit=5
-> Seq Scan on t2 (cost=0.00..1.01 rows=1 width=4) (actual time=0.024..0.025 rows=1.00 loops=1)
Buffers: shared hit=1
-> Index Only Scan using t1_pkey on t1 (cost=0.43..108.23 rows=32 width=8) (actual time=0.036..0.036 rows=1.00 loops=1)
Index Cond: ((a = t2.a) AND (b = 0))
Heap Fetches: 0
Index Searches: 1
Buffers: shared hit=4
Planning Time: 0.257 ms
Execution Time: 0.114 ms
(11 rows)
```
Best Regards,
Fei Changhong
Fei Changhong
pgsql-hackers by date: