On Wed, Apr 10, 2013 at 09:49:55AM -0400, Steve Singer wrote:
> I'm encountering an issue where PG 9.2.4 (we also see this with
> 9.2.3) is picking a plan involving a bitmap heap scan that turns out
> to be much slower than a nested-loop plan using indexes.
>
> The planner picks the hashjoin plan by default (see attached files)
>
> Bitmap Heap Scan on public.table_b_2 b (cost=172635.99..9800225.75
> rows=8435754 width=10) (actual t
> ime=9132.194..1785196.352 rows=9749680 loops=1)
> Recheck Cond: ((b.organization_id = 3)
> AND (b.year = 2013) AND (b.month = 3))
> Rows Removed by Index Recheck: 313195667
> Filter: (b.product_id = 2)
>
> Is the part that seems be causing the problem (or at least taking
> most of the time, other than the final aggregation)
>
> If I set enable_hashjoin=false and enable_mergejoin=false I get the
> nestedloop join plan.
>
> table_b is 137 GB plus indexes each on is around 43 GB
> table_a is 20 GB
>
> random_page_cost = 2.0
> effective_cache_size = 3500MB
> cpu_tuple_cost = 0.01
> cpu_index_tuple_cost = 0.005
> cpu_operator_cost = 0.0025
> work_mem = 64MB
> shared_buffers = 300MB (for this output, I've also had it at 2GB)
>
> If I bump cpu_tuple_cost to the 10-20 range it will pick the nested
> loop join for some date ranges but not all. cpu_tuple_cost of 20
> doesn't sound like an sane value.
>
> This database used to run 8.3 where it picked the nested-loop join.
> We used pg_upgrade to migrate to 9.2
>
> Any ideas why the bitmap heap scan is much slower than the planner expects?
>
> Steve
Hi Steve,
The one thing that stands out to me is that you are working with 200GB of
data on a machine with 4-8GB of ram and you have the random_page_cost set
to 2.0. That is almost completely uncached and I would expect a value of
10 or more to be closer to reality.
Regards,
Ken