Re: slow bitmap heap scans on pg 9.2 - Mailing list pgsql-performance

From Steve Singer
Subject Re: slow bitmap heap scans on pg 9.2
Date
Msg-id 5166C69B.1010505@ca.afilias.info
Whole thread Raw
In response to Re: slow bitmap heap scans on pg 9.2  (Steve Singer <ssinger@ca.afilias.info>)
Responses Re: slow bitmap heap scans on pg 9.2
List pgsql-performance
On 13-04-10 07:54 PM, Steve Singer wrote:
> On 13-04-10 02:06 PM, Jeff Janes wrote:
>> On Wed, Apr 10, 2013 at 6:49 AM, Steve Singer <ssinger@ca.afilias.info
>> <mailto:ssinger@ca.afilias.info>> wrote:
>>
>
>> I think the index recheck means your bitmap is overflowing (i.e. needing
>> more space than work_mem) and so keeping only the pages which have at
>> least one match, which means all rows in those pages need to be
>> rechecked.  How many rows does the table have?  You might be essentially
>> doing a seq scan, but with the additional overhead of the bitmap
>> machinery.  Could you do "explain (analyze,buffers)", preferably with
>> track_io_timing set to on?
>>
>
> table_b has 1,530,710,469 rows
>
> Attached is the output with track_io_timings and buffers.
>

I've done some more testing with a random_page_cost=20.

This gives me the nested-loop plan for the various date ranges I've tried.

However table_a_2 and table_b_2 are actually partition tables. This
query only needs to look at a single partition.  When I run this same
query against a different partition (a smaller partition, but still
bigger than cache) it picks hash join plan involving a seq scan of
table_b but no bitmap index scan.   On this partition the hash-join
plans tend to take 15 minutes versus 2 minutes when I disable hashjoin
plans. Bumping random_page_cost higher doesn't fix this.

I think the reason why it is picking the hash join based plans is
because of

Index Scan using table_b_1_ptid_orgid_ym_unq on table_b_1 b
(cost=0.00..503.86 rows=1 width=10) (actual time=0.016..0.017 rows=1
loops=414249)
                     Index Cond: ((a.id = a_id) AND (organization_id =
2) AND (year = 2013) AND (month = 3))
                     Filter: (product_id = 1)

I think we are over-estimating the cost of the index scans in the inner
loop.  This seems similar to what was discussed a few months ago
http://www.postgresql.org/message-id/092a01cdd230$ff6143c0$fe23cb40$@foo.me.uk

This version of PG should have 3e9960e9d935e7e applied.  I am trying to
get the database copied to a machine where I can easily switch PG
versions and test this against something prior to that commit and also
against a 9.3 build.

Steve


>
>
>>   Cheers,
>>
>> Jeff
>


Attachment

pgsql-performance by date:

Previous
From: Luigi Saggese
Date:
Subject: Performance ts_vector fulltext search
Next
From: Tom Lane
Date:
Subject: Re: Performance ts_vector fulltext search