Re: Bitmap scan is undercosted? - Mailing list pgsql-performance

From Jeff Janes
Subject Re: Bitmap scan is undercosted?
Date
Msg-id CAMkU=1yKK0M4W+bNc31PZhH15dzPakSmpr6TH9exT8OUZvmg_w@mail.gmail.com
Whole thread Raw
In response to Re: Bitmap scan is undercosted?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Bitmap scan is undercosted? - boolean correlation
Re: Bitmap scan is undercosted?
List pgsql-performance
On Sat, Dec 2, 2017 at 3:44 PM, Tom Lane wrote: > Jeff Janes writes: > > On Fri, Dec 1, 2017 at 11:08 PM, Vitaliy Garnashevich < > > vgarnashevich@gmail.com> wrote: > >> # x4 tuple/operator costs - bitmap scan still a bit cheaper > >> set seq_page_cost = 1.0; > >> set random_page_cost = 1.0; > >> set cpu_tuple_cost = 0.04; > >> set cpu_index_tuple_cost = 0.02; > >> set cpu_operator_cost = 0.01; > > > If you really want to target the plan with the BitmapAnd, you should > > increase cpu_index_tuple_cost and/or cpu_operator_cost but not increase > > cpu_tuple_cost. That is because the unselective bitmap index scan does > > not incur any cpu_tuple_cost, but does incur index_tuple and operator > > costs. Unfortunately all other index scans in the system will also be > > skewed by such a change if you make the change system-wide. > > I think it'd be a serious error to screw around with your cost settings > on the basis of a single case in which the rowcount estimates are so > far off. It's really those estimates that are the problem AFAICS. > > The core issue in this example is that, the way the test data is set up, > the "flag = true" condition actually adds no selectivity at all, because > every row with "num = 1" is certain to have "flag = true". If the planner > realized that, it would certainly not bother with BitmapAnd'ing the flag > index onto the results of the num index. But it doesn't know that those > columns are correlated, so it supposes that adding the extra index will > give a 10x reduction in the number of heap rows that have to be visited > (since it knows that only 1/10th of the rows have "flag = true"). > *That* is what causes the overly optimistic cost estimate for the > two-index bitmapscan, and no amount of fiddling with the cost parameters > will make that better. > But he also tested with num=2 and num=39, which reverses the situation so the bitmap is 100% selective rather than the 90% the planner thinks it will be. But it is still slower for him (I am having trouble replicating that exact behavior), so building the bitmap to rule out 100% of the rows is empirically not worth it, I don't see how building it to rule out 90%, as the planner things, would be any better. > I tried creating multiple-column statistics using the v10 facility for > that: > > regression=# create statistics s1 on num, flag from aaa; > CREATE STATISTICS > regression=# analyze aaa; > ANALYZE > > but that changed the estimate not at all, which surprised me because > dependency statistics are supposed to fix exactly this type of problem. > I suspect there may be something in the extended-stats code that causes it > not to work right for boolean columns --- this wouldn't be excessively > surprising because of the way the planner messes around with converting > "flag = true" to just "flag" and sometimes back again. But I've not > looked closer yet. > I think the non-extended stats code also has trouble with booleans. pg_stats gives me a correlation of 0.8 or higher for the flag column. Due to that, when I disable bitmapscans and seqscans, I start getting slow index scans on the wrong index, i2 rather than i1. I don't know why he doesn't see that in his example. Cheers, Jeff

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Bitmap scan is undercosted?
Next
From: Justin Pryzby
Date:
Subject: Re: Bitmap scan is undercosted? - boolean correlation