Re: Planner selects slow "Bitmap Heap Scan" when "Index Scan" is faster - Mailing list pgsql-performance

From Kim Hansen
Subject Re: Planner selects slow "Bitmap Heap Scan" when "Index Scan" is faster
Date
Msg-id CAEGYRW5ZGZ_WipDBXV6K-61FwfOuPmY5_07tApnU=-BcFc0Ujg@mail.gmail.com
Whole thread Raw
In response to Re: Planner selects slow "Bitmap Heap Scan" when "Index Scan" is faster  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-performance
On Tue, Apr 10, 2012 at 04:59, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Fri, Apr 6, 2012 at 3:09 PM, Kim Hansen <kim@rthansen.dk> wrote:
>
>> I have run the queries a few times in order to warm up the caches, the
>> queries stabilise on 20ms and 180ms.
>
> My first curiosity is not why the estimate is too good for Bitmap
> Index Scan, but rather why the actual execution is too poor.  As far
> as I can see the only explanation for the poor execution is that the
> bitmap scan has gone lossy, so that every tuple in every touched block
> needs to be rechecked against the where clause.  If that is the case,
> it suggests that your work_mem is quite small.
>
> In 9.2, explain analyze will report the number of tuples filtered out
> by rechecking, but that isn't reported in your version.
>
> It looks like the planner makes no attempt to predict when a bitmap
> scan will go lossy and then penalize it for the extra rechecks it will
> do.  Since it doesn't know it will be carrying out those extra checks,
> you can't just increase the tuple or operator costs factors.

You are right, when I increase the work_mem from 1MB to 2MB the time
decreases from 180ms to 30ms for the slow query. I have now configured
the server to 10MB work_mem.

> So that may explain why the bitmap is not getting penalized for its
> extra CPU time.  But that doesn't explain why the estimated cost is
> substantially lower than the index scan.  That is probably because the
> bitmap scan expects it is doing more sequential IO and less random IO.
>  You could cancel that advantage be setting random_page_cost to about
> the same as seq_page_cost (which since you indicated most data will be
> cached, would be an appropriate thing to do regardless of this
> specific issue).

I have set seq_page_cost and random_page_cost to 0.1 in order to
indicate that data is cached, the system now selects the faster index
scan.

Thanks for your help,
--
Kim Rydhof Thor Hansen
Vadgårdsvej 3, 2. tv.
2860 Søborg
Phone: +45 3091 2437

pgsql-performance by date:

Previous
From: Istvan Endredy
Date:
Subject: Re: bad planning with 75% effective_cache_size
Next
From: Eyal Wilde
Date:
Subject: scale up (postgresql vs mssql)