Re: [PERFORM] Slow query: bitmap scan troubles - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: [PERFORM] Slow query: bitmap scan troubles |
Date | |
Msg-id | CA+U5nMKvRhKviNYagHj+X-A__5uFFJEEGzBJe=f40ho2DbVk-Q@mail.gmail.com Whole thread Raw |
In response to | Re: [PERFORM] Slow query: bitmap scan troubles (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
On 6 January 2013 23:03, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: >> On 6 January 2013 18:58, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> IIRC, one of my very first attempts to deal with this was to charge >>> random_page_cost per level of index descended. This was such a horrid >>> overestimate that it never went anywhere. I think that reflects that in >>> practical applications, the upper levels of the index tend to stay in >>> cache. We could ignore I/O on that assumption and still try to model >>> CPU costs of the descent, which is basically what Jeff is proposing. >>> My objection to his formula is mainly that it ignores physical index >>> size, which I think is important to include somehow for the reasons >>> I explained in my other message. > >> Having a well principled approach will help bring us towards a >> realistic estimate. > > I thought about this some more and came up with what might be a > reasonably principled compromise. Assume that we know there are N > leaf entries in the index (from VACUUM stats) and that we know the > root page height is H (from looking at the btree metapage). (Note: > H starts at zero for a single-page index.) If we assume that the > number of tuples per page, P, is more or less uniform across leaf > and upper pages (ie P is the fanout for upper pages), then we have > N/P = number of leaf pages > N/P/P = number of level 1 pages > N/P^3 = number of level 2 pages > N/P^(h+1) = number of level h pages > Solving for the minimum P that makes N/P^(H+1) <= 1, we get > P = ceil(exp(ln(N)/(H+1))) > as an estimate of P given the known N and H values. > > Now, if we consider only CPU costs of index descent, we expect > about log2(P) comparisons to be needed on each of the H upper pages > to be descended through, that is we have total descent cost > cpu_index_tuple_cost * H * log2(P) > > If we ignore the ceil() step as being a second-order correction, this > can be simplified to > > cpu_index_tuple_cost * H * log2(N)/(H+1) > > I propose this, rather than Jeff's formula of cpu_index_tuple_cost * > log2(N), as our fudge factor. The reason I like this better is that > the additional factor of H/(H+1) provides the correction I want for > bloated indexes: if an index is bloated, the way that reflects into > the cost of any particular search is that the number of pages to be > descended through is larger than otherwise. The correction is fairly > small, particularly for large indexes, but that seems to be what's > expected given the rest of our discussion. Seems good to have something with both N and H in it. This cost model favours smaller indexes over larger ones, whether that be because they're partial and so have smaller N, or whether the key values are thinner and so have lower H. > We could further extend this by adding some I/O charge when the index is > sufficiently large, as per Simon's comments, but frankly I think that's > unnecessary. Unless the fan-out factor is really awful, practical-sized > indexes probably have all their upper pages in memory. What's more, per > my earlier comment, when you start to think about tables so huge that > that's not true it really doesn't matter if we charge another > random_page_cost or two for an indexscan --- it'll still be peanuts > compared to the seqscan alternative. Considering that we're trying to decide between various indexes on one table, we don't have enough information to say which index the cache favours and the other aspects of cacheing are the same for all indexes of any given size. So we can assume those effects cancel out for comparison purposes, even if they're non-zero. And as you say, they're negligible in comparison with bitmapindexscans etc.. The only time I'd question that would be in the case of a nested loops join but that's not important here. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: