Thread: choose_bitmap_and again (was Re: [PERFORM] Strangely Variable Query Performance)
choose_bitmap_and again (was Re: [PERFORM] Strangely Variable Query Performance)
From
Tom Lane
Date:
Steve <cheetah@tanabi.org> writes: > [ strange planner misbehavior in 8.2.3 ] After some off-list investigation (thanks, Steve, for letting me poke at your machine), the short answer is that the heuristics used by choose_bitmap_and() suck. The problem query is like select ... from ds where ds.receipt >= '1998-12-30 0:0:0' and ds.encounter_id in ( ... 100 distinct values ... ); and the table has a truly remarkable variety of indexes on encounter_id, receipt, and combinations of them with other columns. The receipt condition is actually in effect a no-op, because all receipt dates are later than that, but because ineq_histogram_selectivity doesn't trust histogram data unreservedly we compute a selectivity of about 0.99997 for it. That means that the indexes that cover both receipt and encounter_id are given a selectivity score just fractionally better than those involving encounter_id alone, and therefore they sort first in choose_bitmap_and's sort step, and the way that that routine is coded, only combinations of the very first index with other ones will be considered for a bitmap heap scan. So the possibility of using just the index on encounter_id alone is never considered, even though that alternative is vastly cheaper than the alternatives that are considered. (It happens that encounter_id is a low-order column in all the indexes that include receipt, and so these scans end up covering the whole index ... multiple times even. The cost estimation is fine --- the thing knows these are expensive --- what's falling down is the heuristic for which combinations of indexes to consider using in a bitmap scan.) The original coding of choose_bitmap_and involved a "fuzzy" comparison of selectivities, which would have avoided this problem, but we got rid of that later because it had its own problems. In fact, choose_bitmap_and has caused us enough problems that I'm thinking we need a fundamental rethink of how it works, rather than just marginal tweaks. If you haven't looked at this code before, the comments explain the idea well enough: /* * choose_bitmap_and * Given a nonempty list of bitmap paths, AND them into one path. * * This is a nontrivial decision since we can legally use any subset of the * given path set. We want to choose a good tradeoff between selectivity * and cost of computing the bitmap. * * The result is either a single one of the inputs, or a BitmapAndPath * combining multiple inputs. */ ... /* * In theory we should consider every nonempty subset of the given paths. * In practice that seems like overkill, given the crude nature of the * estimates, not to mention the possible effects of higher-level AND and * OR clauses. As a compromise, we sort the paths by selectivity. We * always take the first, and sequentially add on paths that result in a * lower estimated cost. * * We also make some effort to detect directly redundant input paths, as * can happen if there are multiple possibly usable indexes. (Another way * it can happen is that best_inner_indexscan will find the same OR join * clauses that create_or_index_quals has pulled OR restriction clauses * out of, and then both versions show up as duplicate paths.) We * consider an index redundant if any of its index conditions were already * used by earlier indexes. (We could use predicate_implied_by to have a * more intelligent, but much more expensive, check --- but in most cases * simple pointer equality should suffice, since after all the index * conditions are all coming from the same RestrictInfo lists.) * * You might think the condition for redundancy should be "all index * conditions already used", not "any", but this turns out to be wrong. * For example, if we use an index on A, and then come to an index with * conditions on A and B, the only way that the second index can be later * in the selectivity-order sort is if the condition on B is completely * non-selective. In any case, we'd surely be drastically misestimating * the selectivity if we count the same condition twice. * * We include index predicate conditions in the redundancy test. Because * the test is just for pointer equality and not equal(), the effect is * that use of the same partial index in two different AND elements is * considered redundant. (XXX is this too strong?) * * Note: outputting the selected sub-paths in selectivity order is a good * thing even if we weren't using that as part of the selection method, * because it makes the short-circuit case in MultiExecBitmapAnd() more * likely to apply. */ One idea I thought about was to sort by index scan cost, using selectivity only as a tiebreaker for cost, rather than the other way around as is currently done. This seems fairly plausible because indexscans that are cheaper than other indexscans likely return fewer rows too, and so selectivity is already accounted for to some extent --- at least you can't have an enormously worse selectivity at lower cost, whereas Steve's example proves it doesn't work the other way. But I'm worried about breaking the reasoning about redundant indexes that's mentioned in the comments. Another alternative that would respond to the immediate problem is to maintain the current sort order, but as we come to each index, consider using that one alone, and throw away whatever AND we might have built up if that one alone beats the AND-so-far. This seems more conservative, as it's unlikely to break any cases that work well now, but on the other hand it feels like plastering another wart atop a structure that's already rather rickety. Has anyone got any thoughts about the best way to do this? regards, tom lane
Re: choose_bitmap_and again (was Re: [PERFORM] Strangely Variable Query Performance)
From
Alvaro Herrera
Date:
Tom Lane wrote: > One idea I thought about was to sort by index scan cost, using > selectivity only as a tiebreaker for cost, rather than the other way > around as is currently done. This seems fairly plausible because > indexscans that are cheaper than other indexscans likely return fewer > rows too, and so selectivity is already accounted for to some extent --- > at least you can't have an enormously worse selectivity at lower cost, > whereas Steve's example proves it doesn't work the other way. But I'm > worried about breaking the reasoning about redundant indexes that's > mentioned in the comments. > > Another alternative that would respond to the immediate problem is to > maintain the current sort order, but as we come to each index, consider > using that one alone, and throw away whatever AND we might have built up > if that one alone beats the AND-so-far. This seems more conservative, > as it's unlikely to break any cases that work well now, but on the other > hand it feels like plastering another wart atop a structure that's > already rather rickety. > > Has anyone got any thoughts about the best way to do this? How about doing both: sort the index by index scan cost; then pick the first index on the list and start adding indexes when they lower the cost. When adding each index, consider it by itself against the already stacked indexes. If the cost is lower, put this index at the top of the list, and restart the algorithm (after the sorting step of course). I think the concern about condition redundancy should be attacked separately. How about just comparing whether they have common prefixes of conditions? I admit I don't understand what would happen with indexes defined like (lower(A), B, C) versus (A, B) for example. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Re: choose_bitmap_and again (was Re: [PERFORM] Strangely Variable Query Performance)
From
"Dann Corbit"
Date:
> -----Original Message----- > From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers- > owner@postgresql.org] On Behalf Of Alvaro Herrera > Sent: Friday, April 13, 2007 4:24 PM > To: Tom Lane > Cc: pgsql-hackers@postgreSQL.org; PostgreSQL Performance; Steve > Subject: Re: [HACKERS] choose_bitmap_and again (was Re: [PERFORM] > Strangely Variable Query Performance) > > Tom Lane wrote: > > > One idea I thought about was to sort by index scan cost, using > > selectivity only as a tiebreaker for cost, rather than the other way > > around as is currently done. This seems fairly plausible because > > indexscans that are cheaper than other indexscans likely return fewer > > rows too, and so selectivity is already accounted for to some extent --- > > at least you can't have an enormously worse selectivity at lower cost, > > whereas Steve's example proves it doesn't work the other way. But I'm > > worried about breaking the reasoning about redundant indexes that's > > mentioned in the comments. > > > > Another alternative that would respond to the immediate problem is to > > maintain the current sort order, but as we come to each index, consider > > using that one alone, and throw away whatever AND we might have built up > > if that one alone beats the AND-so-far. This seems more conservative, > > as it's unlikely to break any cases that work well now, but on the other > > hand it feels like plastering another wart atop a structure that's > > already rather rickety. > > > > Has anyone got any thoughts about the best way to do this? > > How about doing both: sort the index by index scan cost; then pick the > first index on the list and start adding indexes when they lower the > cost. When adding each index, consider it by itself against the > already stacked indexes. If the cost is lower, put this index at the > top of the list, and restart the algorithm (after the sorting step of > course). > > I think the concern about condition redundancy should be attacked > separately. How about just comparing whether they have common prefixes > of conditions? I admit I don't understand what would happen with > indexes defined like (lower(A), B, C) versus (A, B) for example. Instead of sorting, I suggest the quickselect() algorithm, which is O(n). Probably, if the list is small, it won't matter much, but it might offer some tangible benefit. Here is an example of the algorithm: #include <stdlib.h> typedef double Etype; /* Season to taste. */ extern Etype RandomSelect(Etype * A, size_t p, size_t r, size_t i); extern size_t RandRange(size_t a, size_t b); extern size_t RandomPartition(Etype * A, size_t p, size_t r); extern size_t Partition(Etype * A, size_t p, size_t r); /* ** ** In the following code, every reference to CLR means: ** ** "Introduction to Algorithms" ** By Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest ** ISBN 0-07-013143-0 */ /* ** CLR, page 187 */ Etype RandomSelect(Etype A[], size_t p, size_t r, size_t i) { size_t q, k; if (p == r) return A[p]; q = RandomPartition(A, p, r); k = q - p+ 1; if (i <= k) return RandomSelect(A, p, q, i); else return RandomSelect(A, q + 1, r, i - k); } size_t RandRange(size_t a, size_t b) { size_t c = (size_t) ((double) rand() / ((double) RAND_MAX + 1) * (b - a)); return c + a; } /* ** CLR, page 162 */ size_t RandomPartition(Etype A[], size_t p, size_t r) { size_t i = RandRange(p, r); Etype Temp; Temp = A[p]; A[p] = A[i]; A[i] = Temp; return Partition(A,p, r); } /* ** CLR, page 154 */ size_t Partition(Etype A[], size_t p, size_t r) { Etype x, temp; size_t i, j; x = A[p]; i = p - 1; j = r + 1; for (;;) { do { j--; } while (!(A[j] <= x)); do { i++; } while (!(A[i] >=x)); if (i < j) { temp = A[i]; A[i] = A[j]; A[j] = temp; } else returnj; } }
Re: choose_bitmap_and again (was Re: [PERFORM] Strangely Variable Query Performance)
From
Tom Lane
Date:
"Dann Corbit" <DCorbit@connx.com> writes: > Instead of sorting, I suggest the quickselect() algorithm, which is > O(n). What for? Common cases have less than half a dozen entries. That is not the place we need to be spending engineering effort --- what we need to worry about is what's the choice algorithm, not implementation details. regards, tom lane
Re: choose_bitmap_and again (was Re: [PERFORM] Strangely Variable Query Performance)
From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes: > I think the concern about condition redundancy should be attacked > separately. How about just comparing whether they have common prefixes > of conditions? I admit I don't understand what would happen with > indexes defined like (lower(A), B, C) versus (A, B) for example. I understand that issue a bit better than I did when I wrote the comment (so I suppose I better rewrite it). The $64 reason for rejecting AND-combinations of indexes that are using some of the same WHERE-conditions is that if we don't, we effectively double-count the selectivity of those conditions, causing us to prefer useless AND-combinations. An example is "WHERE A > 42 AND B < 100" where we have an index on A and one on (A,B). The selectivity calculation will blindly assume that the selectivities of the two indexes are independent and thus prefer to BitmapAnd them, when obviously there is no point in using both. Ideally we should improve the selectivity calculation to not get fooled like this, but that seems hard and probably slow. So for the moment we have the heuristic that no WHERE-clause should be used twice in any AND-combination. Given that we are using that heuristic, it becomes important that we visit the indexes in the "right order" --- as the code stands, in the (A) vs (A,B) case it will consider only the first index it arrives at in the selectivity sort order, because the second will be rejected on the basis of re-use of the WHERE A > 42 condition. Sorting by selectivity tends to ensure that we pick the index that can make use of as many WHERE-clauses as possible. The idea of considering each index alone fixes the order dependency for cases where a single index is the best answer, but it doesn't help much for cases where you really do want a BitmapAnd, only not one using the index with the individually best selectivity. We really need a heuristic here --- exhaustive search will be impractical in cases where there are many indexes, because you'd be looking at 2^N combinations. (In Steve's example there are actually 38 relevant indexes, which is bad database design if you ask me, but in any case we cannot afford to search through 2^38 possibilities.) But the one we're using now is too fragile. Maybe we should use a cutoff similar to the GEQO one: do exhaustive search if there are less than N relevant indexes, for some N. But that's not going to help Steve; we still need a smarter heuristic for what to look for above the cutoff. regards, tom lane
Re: choose_bitmap_and again (was Re: [PERFORM] Strangely Variable Query Performance)
From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes: > Tom Lane wrote: >> Has anyone got any thoughts about the best way to do this? > How about doing both: sort the index by index scan cost; then pick the > first index on the list and start adding indexes when they lower the > cost. When adding each index, consider it by itself against the > already stacked indexes. If the cost is lower, put this index at the > top of the list, and restart the algorithm (after the sorting step of > course). The "restart" part of that bothers me --- it's not entirely clear that you couldn't get into an infinite loop. (Imagine that A+B is better than A alone, so we adopt it, but worse than C alone, so we take C as the new leader and start over. Then perhaps C+B is better than C alone but worse than A alone, so we take A as the leader and start over. Maybe this is impossible but I'm unsure.) I looked back at the gdb results I'd gotten from Steve's example and noticed that for his 38 indexes there were only three distinct index selectivity values, and the sort step grouped indexes by cost within those groups. In hindsight of course this is obvious: the selectivity depends on the set of WHERE-clauses used, so with two WHERE-clauses there are three possible selectivities (to within roundoff error anyway) depending on whether the index uses one or both clauses. So the existing algorithm gets one thing right: for any two indexes that make use of just the same set of WHERE-clauses, it will always take the one with cheaper scan cost. Thinking more about this leads me to the following proposal: 1. Explicitly group the indexes according to the subset of WHERE-conditions (and partial index conditions, if any) they use. Within each such group, discard all but the cheapest-scan-cost one. 2. Sort the remaining indexes according to scan cost. 3. For each index in order, consider it as a standalone scan, and also consider adding it on to the AND-group led by each preceding index, using the same logic as now: reject using any WHERE-condition twice in a group, and then add on only if the total cost of the AND-group scan is reduced. This would be approximately O(N^2) in the number of relevant indexes, whereas the current scheme is roughly linear (handwaving a bit here because the number of WHERE-clauses is a factor too). But that seems unlikely to be a problem for sane numbers of indexes, unlike the O(2^N) behavior of an exhaustive search. It would get rid of (most of) the order-sensitivity problem, since we would definitely consider the AND-combination of every pair of combinable indexes. I can imagine cases where it wouldn't notice useful three-or-more-way combinations because the preceding two-way combination didn't win, but they seem pretty remote possibilities. Comments? regards, tom lane
Re: choose_bitmap_and again (was Re: [PERFORM] StrangelyVariable Query Performance)
From
"Simon Riggs"
Date:
On Fri, 2007-04-13 at 18:48 -0400, Tom Lane wrote: > Has anyone got any thoughts about the best way to do this? I don't think we know enough to pick one variant that works in all cases. This requires more detailed analysis of various cases. Lets put in a parameter to allow the options to be varied. The purpose would be to look for some more information that allows us to see what the pre-conditions would be for each heuristic. Initially, we say this is a beta-only feature and may be withdrawn in the production version. Why did it not pick my index? is a tiring game, but one that must be played. I hope that the ORDER LIMIT optimization should reduce the number of indexes chosen to maintain sort order in the output. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com