I've got another query I'm trying to optimize:
select aj.album from
public.track t
join public.albumjoin aj
on (aj.track = t.id)
join (select id from public.albummeta am where tracks between 10 and
14) lam
on (lam.id = aj.album)
where (t.name % '01New OrderEvil Dust' or t.name % '04OrbitalOpen Mind')
group by aj.album having count(*) >= 9.6;
This gives an expensive (but still reasonable) plan of:
HashAggregate (cost=76523.64..76602.25 rows=4492 width=4)
Filter: ((count(*))::numeric >= 9.6)
-> Hash Join (cost=63109.73..76501.18 rows=4492 width=4)
Hash Cond: ("outer".id = "inner".album)
-> Bitmap Heap Scan on albummeta am
(cost=1810.10..9995.34 rows=187683 width=4)
Recheck Cond: ((tracks >= 10) AND (tracks <= 14))
-> Bitmap Index Scan on albummeta_tracks_index
(cost=0.00..1810.10 rows=187683 width=0)
Index Cond: ((tracks >= 10) AND (tracks <= 14))
-> Hash (cost=61274.03..61274.03 rows=10243 width=4)
-> Nested Loop (cost=163.87..61274.03 rows=10243
width=4)
-> Bitmap Heap Scan on track t
(cost=163.87..28551.33 rows=10243 width=4)
Recheck Cond: (((name)::text % '01New
OrderEvil Dust'::text) OR ((name)::text % '04OrbitalOpen Mind'::text))
-> BitmapOr (cost=163.87..163.87
rows=10248 width=0)
-> Bitmap Index Scan on
track_name_trgm_idx (cost=0.00..81.93 rows=5124 width=0)
Index Cond: ((name)::text %
'01New OrderEvil Dust'::text)
-> Bitmap Index Scan on
track_name_trgm_idx (cost=0.00..81.93 rows=5124 width=0)
Index Cond: ((name)::text %
'04OrbitalOpen Mind'::text)
-> Index Scan using albumjoin_trackindex on
albumjoin aj (cost=0.00..3.18 rows=1 width=8)
Index Cond: (aj.track = "outer".id)
(19 rows)
Unfortunately, when I modify this example to use a more typical
number of trigram searches or'd together (anywhere from 10 to 20),
the planner thinks the bitmap heap scan on track t will return a lot
of rows, and so reverts to doing a sequential scan of albumjoin for
the next table join. That would make sense.... IF there were a lot of
rows returned by the bitmap index scans. But here is where the
planner gets it really wrong, if I'm reading it right.
It seems to think both my index scans will return 5124 rows, when, in
reality, it's a lot less:
select count(*) from public.track where name % '01New OrderEvil Dust';
count
-------
20
(1 row)
select count(*) from public.track where name % '04OrbitalOpen Mind';
count
-------
123
(1 row)
How can I get the planner to not expect so many rows to be returned?
A possibly related question is: because pg_tgrm lets me set the
matching threshold of the % operator, how does that affect the planner?