I'm encountering an issue where PG 9.2.4 (we also see this with 9.2.3)
is picking a plan involving a bitmap heap scan that turns out to be much
slower than a nested-loop plan using indexes.
The planner picks the hashjoin plan by default (see attached files)
Bitmap Heap Scan on public.table_b_2 b (cost=172635.99..9800225.75
rows=8435754 width=10) (actual t
ime=9132.194..1785196.352 rows=9749680 loops=1)
Recheck Cond: ((b.organization_id = 3) AND
(b.year = 2013) AND (b.month = 3))
Rows Removed by Index Recheck: 313195667
Filter: (b.product_id = 2)
Is the part that seems be causing the problem (or at least taking most
of the time, other than the final aggregation)
If I set enable_hashjoin=false and enable_mergejoin=false I get the
nestedloop join plan.
table_b is 137 GB plus indexes each on is around 43 GB
table_a is 20 GB
random_page_cost = 2.0
effective_cache_size = 3500MB
cpu_tuple_cost = 0.01
cpu_index_tuple_cost = 0.005
cpu_operator_cost = 0.0025
work_mem = 64MB
shared_buffers = 300MB (for this output, I've also had it at 2GB)
If I bump cpu_tuple_cost to the 10-20 range it will pick the nested loop
join for some date ranges but not all. cpu_tuple_cost of 20 doesn't
sound like an sane value.
This database used to run 8.3 where it picked the nested-loop join. We
used pg_upgrade to migrate to 9.2
Any ideas why the bitmap heap scan is much slower than the planner expects?
Steve