Re: Bad estimate with partial index - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Bad estimate with partial index |
Date | |
Msg-id | 2568735.1650391306@sss.pgh.pa.us Whole thread Raw |
In response to | Bad estimate with partial index (André Hänsel <andre@webkr.de>) |
Responses |
Re: Bad estimate with partial index
|
List | pgsql-hackers |
=?iso-8859-1?Q?Andr=E9_H=E4nsel?= <andre@webkr.de> writes: > I have a case where Postgres chooses the wrong index and I'm not sure what > to do about it: The core problem here seems to be a poor estimate for the selectivity of "WHERE cropped AND NOT resized": regression=# EXPLAIN ANALYZE SELECT count(*) FROM t WHERE cropped AND NOT resized ; ... -> Bitmap Heap Scan on t (cost=35.26..6352.26 rows=91100 width=0) (actual time=0.121..0.190 rows=1000 loops=1) Recheck Cond: (cropped AND (NOT resized)) ... I think this is because the planner expects those two columns to be independent, which they are completely not in your test data. Perhaps that assumption is more true in your real-world data, but since you're here complaining, I suppose not :-(. What you can do about that, in recent Postgres versions, is to create extended statistics on the combination of the columns: regression=# create statistics t_stats on cropped, resized from t; CREATE STATISTICS regression=# analyze t; ANALYZE regression=# EXPLAIN ANALYZE SELECT count(*) FROM t WHERE cropped AND NOT resized AND create_date < CURRENT_DATE; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------ Aggregate (cost=3145.15..3145.16 rows=1 width=8) (actual time=9.765..9.766 rows=1 loops=1) -> Index Scan using idx_resized on t (cost=0.29..3142.65 rows=1000 width=0) (actual time=9.608..9.735 rows=1000 loops=1) Filter: (cropped AND (create_date < CURRENT_DATE)) Rows Removed by Filter: 100000 Planning Time: 0.115 ms Execution Time: 9.779 ms Better estimate, but it's still using the wrong index :-(. If we force use of the other one: regression=# drop index idx_resized; DROP INDEX regression=# EXPLAIN ANALYZE regression-# SELECT count(*) FROM t regression-# WHERE cropped AND NOT resized AND create_date < CURRENT_DATE; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=6795.38..6795.39 rows=1 width=8) (actual time=0.189..0.191 rows=1 loops=1) -> Bitmap Heap Scan on t (cost=13.40..6792.88 rows=1000 width=0) (actual time=0.047..0.147 rows=1000 loops=1) Recheck Cond: (cropped AND (NOT resized)) Filter: (create_date < CURRENT_DATE) Heap Blocks: exact=6 -> Bitmap Index Scan on specific (cost=0.00..13.15 rows=91565 width=0) (actual time=0.035..0.035 rows=1000 loops=1) ^^^^^^^^^^ Planning Time: 0.154 ms Execution Time: 0.241 ms it looks like the problem is that the extended stats haven't been used while forming the estimate of the number of index entries retrieved, so we overestimate the cost of using this index. That seems like a bug. Tomas? In the meantime, maybe you could dodge the problem by combining "cropped" and "resized" into one multivalued column, so that there's not a need to depend on extended stats to arrive at a decent estimate. regards, tom lane
pgsql-hackers by date: