pg_trgm indexes giving bad estimations? - Mailing list pgsql-performance
From | Ben |
---|---|
Subject | pg_trgm indexes giving bad estimations? |
Date | |
Msg-id | FFBBFAF9-7BA8-42BE-8958-5F6702AB2663@silentmedia.com Whole thread Raw |
Responses |
Re: pg_trgm indexes giving bad estimations?
|
List | pgsql-performance |
I've got another query I'm trying to optimize: select aj.album from public.track t join public.albumjoin aj on (aj.track = t.id) join (select id from public.albummeta am where tracks between 10 and 14) lam on (lam.id = aj.album) where (t.name % '01New OrderEvil Dust' or t.name % '04OrbitalOpen Mind') group by aj.album having count(*) >= 9.6; This gives an expensive (but still reasonable) plan of: HashAggregate (cost=76523.64..76602.25 rows=4492 width=4) Filter: ((count(*))::numeric >= 9.6) -> Hash Join (cost=63109.73..76501.18 rows=4492 width=4) Hash Cond: ("outer".id = "inner".album) -> Bitmap Heap Scan on albummeta am (cost=1810.10..9995.34 rows=187683 width=4) Recheck Cond: ((tracks >= 10) AND (tracks <= 14)) -> Bitmap Index Scan on albummeta_tracks_index (cost=0.00..1810.10 rows=187683 width=0) Index Cond: ((tracks >= 10) AND (tracks <= 14)) -> Hash (cost=61274.03..61274.03 rows=10243 width=4) -> Nested Loop (cost=163.87..61274.03 rows=10243 width=4) -> Bitmap Heap Scan on track t (cost=163.87..28551.33 rows=10243 width=4) Recheck Cond: (((name)::text % '01New OrderEvil Dust'::text) OR ((name)::text % '04OrbitalOpen Mind'::text)) -> BitmapOr (cost=163.87..163.87 rows=10248 width=0) -> Bitmap Index Scan on track_name_trgm_idx (cost=0.00..81.93 rows=5124 width=0) Index Cond: ((name)::text % '01New OrderEvil Dust'::text) -> Bitmap Index Scan on track_name_trgm_idx (cost=0.00..81.93 rows=5124 width=0) Index Cond: ((name)::text % '04OrbitalOpen Mind'::text) -> Index Scan using albumjoin_trackindex on albumjoin aj (cost=0.00..3.18 rows=1 width=8) Index Cond: (aj.track = "outer".id) (19 rows) Unfortunately, when I modify this example to use a more typical number of trigram searches or'd together (anywhere from 10 to 20), the planner thinks the bitmap heap scan on track t will return a lot of rows, and so reverts to doing a sequential scan of albumjoin for the next table join. That would make sense.... IF there were a lot of rows returned by the bitmap index scans. But here is where the planner gets it really wrong, if I'm reading it right. It seems to think both my index scans will return 5124 rows, when, in reality, it's a lot less: select count(*) from public.track where name % '01New OrderEvil Dust'; count ------- 20 (1 row) select count(*) from public.track where name % '04OrbitalOpen Mind'; count ------- 123 (1 row) How can I get the planner to not expect so many rows to be returned? A possibly related question is: because pg_tgrm lets me set the matching threshold of the % operator, how does that affect the planner?
pgsql-performance by date: