Re: Bitmap scan is undercosted? - Mailing list pgsql-performance
From | Jeff Janes |
---|---|
Subject | Re: Bitmap scan is undercosted? |
Date | |
Msg-id | CAMkU=1yKK0M4W+bNc31PZhH15dzPakSmpr6TH9exT8OUZvmg_w@mail.gmail.com Whole thread Raw |
In response to | Re: Bitmap scan is undercosted? (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Bitmap scan is undercosted? - boolean correlation
Re: Bitmap scan is undercosted? |
List | pgsql-performance |
On Sat, Dec 2, 2017 at 3:44 PM, Tom Lane wrote:
> Jeff Janes writes:
> > On Fri, Dec 1, 2017 at 11:08 PM, Vitaliy Garnashevich <
> > vgarnashevich@gmail.com> wrote:
> >> # x4 tuple/operator costs - bitmap scan still a bit cheaper
> >> set seq_page_cost = 1.0;
> >> set random_page_cost = 1.0;
> >> set cpu_tuple_cost = 0.04;
> >> set cpu_index_tuple_cost = 0.02;
> >> set cpu_operator_cost = 0.01;
>
> > If you really want to target the plan with the BitmapAnd, you should
> > increase cpu_index_tuple_cost and/or cpu_operator_cost but not increase
> > cpu_tuple_cost. That is because the unselective bitmap index scan does
> > not incur any cpu_tuple_cost, but does incur index_tuple and operator
> > costs. Unfortunately all other index scans in the system will also be
> > skewed by such a change if you make the change system-wide.
>
> I think it'd be a serious error to screw around with your cost settings
> on the basis of a single case in which the rowcount estimates are so
> far off. It's really those estimates that are the problem AFAICS.
>
> The core issue in this example is that, the way the test data is set up,
> the "flag = true" condition actually adds no selectivity at all, because
> every row with "num = 1" is certain to have "flag = true". If the planner
> realized that, it would certainly not bother with BitmapAnd'ing the flag
> index onto the results of the num index. But it doesn't know that those
> columns are correlated, so it supposes that adding the extra index will
> give a 10x reduction in the number of heap rows that have to be visited
> (since it knows that only 1/10th of the rows have "flag = true").
> *That* is what causes the overly optimistic cost estimate for the
> two-index bitmapscan, and no amount of fiddling with the cost parameters
> will make that better.
>
But he also tested with num=2 and num=39, which reverses the situation so
the bitmap is 100% selective rather than the 90% the planner thinks it will
be.
But it is still slower for him (I am having trouble replicating that exact
behavior), so building the bitmap to rule out 100% of the rows is
empirically not worth it, I don't see how building it to rule out 90%, as
the planner things, would be any better.
> I tried creating multiple-column statistics using the v10 facility for
> that:
>
> regression=# create statistics s1 on num, flag from aaa;
> CREATE STATISTICS
> regression=# analyze aaa;
> ANALYZE
>
> but that changed the estimate not at all, which surprised me because
> dependency statistics are supposed to fix exactly this type of problem.
> I suspect there may be something in the extended-stats code that causes it
> not to work right for boolean columns --- this wouldn't be excessively
> surprising because of the way the planner messes around with converting
> "flag = true" to just "flag" and sometimes back again. But I've not
> looked closer yet.
>
I think the non-extended stats code also has trouble with booleans.
pg_stats gives me a correlation of 0.8 or higher for the flag column.
Due to that, when I disable bitmapscans and seqscans, I start getting slow
index scans on the wrong index, i2 rather than i1. I don't know why he
doesn't see that in his example.
Cheers,
Jeff
pgsql-performance by date: