choose_bitmap_and again (was Re: Strangely Variable Query Performance) - Mailing list pgsql-performance
From | Tom Lane |
---|---|
Subject | choose_bitmap_and again (was Re: Strangely Variable Query Performance) |
Date | |
Msg-id | 26399.1176504501@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Strangely Variable Query Performance (Steve <cheetah@tanabi.org>) |
Responses |
Re: [HACKERS] choose_bitmap_and again (was Re: Strangely Variable Query Performance)
|
List | pgsql-performance |
Steve <cheetah@tanabi.org> writes: > [ strange planner misbehavior in 8.2.3 ] After some off-list investigation (thanks, Steve, for letting me poke at your machine), the short answer is that the heuristics used by choose_bitmap_and() suck. The problem query is like select ... from ds where ds.receipt >= '1998-12-30 0:0:0' and ds.encounter_id in ( ... 100 distinct values ... ); and the table has a truly remarkable variety of indexes on encounter_id, receipt, and combinations of them with other columns. The receipt condition is actually in effect a no-op, because all receipt dates are later than that, but because ineq_histogram_selectivity doesn't trust histogram data unreservedly we compute a selectivity of about 0.99997 for it. That means that the indexes that cover both receipt and encounter_id are given a selectivity score just fractionally better than those involving encounter_id alone, and therefore they sort first in choose_bitmap_and's sort step, and the way that that routine is coded, only combinations of the very first index with other ones will be considered for a bitmap heap scan. So the possibility of using just the index on encounter_id alone is never considered, even though that alternative is vastly cheaper than the alternatives that are considered. (It happens that encounter_id is a low-order column in all the indexes that include receipt, and so these scans end up covering the whole index ... multiple times even. The cost estimation is fine --- the thing knows these are expensive --- what's falling down is the heuristic for which combinations of indexes to consider using in a bitmap scan.) The original coding of choose_bitmap_and involved a "fuzzy" comparison of selectivities, which would have avoided this problem, but we got rid of that later because it had its own problems. In fact, choose_bitmap_and has caused us enough problems that I'm thinking we need a fundamental rethink of how it works, rather than just marginal tweaks. If you haven't looked at this code before, the comments explain the idea well enough: /* * choose_bitmap_and * Given a nonempty list of bitmap paths, AND them into one path. * * This is a nontrivial decision since we can legally use any subset of the * given path set. We want to choose a good tradeoff between selectivity * and cost of computing the bitmap. * * The result is either a single one of the inputs, or a BitmapAndPath * combining multiple inputs. */ ... /* * In theory we should consider every nonempty subset of the given paths. * In practice that seems like overkill, given the crude nature of the * estimates, not to mention the possible effects of higher-level AND and * OR clauses. As a compromise, we sort the paths by selectivity. We * always take the first, and sequentially add on paths that result in a * lower estimated cost. * * We also make some effort to detect directly redundant input paths, as * can happen if there are multiple possibly usable indexes. (Another way * it can happen is that best_inner_indexscan will find the same OR join * clauses that create_or_index_quals has pulled OR restriction clauses * out of, and then both versions show up as duplicate paths.) We * consider an index redundant if any of its index conditions were already * used by earlier indexes. (We could use predicate_implied_by to have a * more intelligent, but much more expensive, check --- but in most cases * simple pointer equality should suffice, since after all the index * conditions are all coming from the same RestrictInfo lists.) * * You might think the condition for redundancy should be "all index * conditions already used", not "any", but this turns out to be wrong. * For example, if we use an index on A, and then come to an index with * conditions on A and B, the only way that the second index can be later * in the selectivity-order sort is if the condition on B is completely * non-selective. In any case, we'd surely be drastically misestimating * the selectivity if we count the same condition twice. * * We include index predicate conditions in the redundancy test. Because * the test is just for pointer equality and not equal(), the effect is * that use of the same partial index in two different AND elements is * considered redundant. (XXX is this too strong?) * * Note: outputting the selected sub-paths in selectivity order is a good * thing even if we weren't using that as part of the selection method, * because it makes the short-circuit case in MultiExecBitmapAnd() more * likely to apply. */ One idea I thought about was to sort by index scan cost, using selectivity only as a tiebreaker for cost, rather than the other way around as is currently done. This seems fairly plausible because indexscans that are cheaper than other indexscans likely return fewer rows too, and so selectivity is already accounted for to some extent --- at least you can't have an enormously worse selectivity at lower cost, whereas Steve's example proves it doesn't work the other way. But I'm worried about breaking the reasoning about redundant indexes that's mentioned in the comments. Another alternative that would respond to the immediate problem is to maintain the current sort order, but as we come to each index, consider using that one alone, and throw away whatever AND we might have built up if that one alone beats the AND-so-far. This seems more conservative, as it's unlikely to break any cases that work well now, but on the other hand it feels like plastering another wart atop a structure that's already rather rickety. Has anyone got any thoughts about the best way to do this? regards, tom lane
pgsql-performance by date: