choose_bitmap_and again (was Re: Strangely Variable Query Performance)

From: Tom Lane
Subject: choose_bitmap_and again (was Re: Strangely Variable Query Performance)
Date: ,
Msg-id: 26399.1176504501@sss.pgh.pa.us
(view: Whole thread, Raw)
In response to: Re: Strangely Variable Query Performance  (Steve)
Responses: Re: [HACKERS] choose_bitmap_and again (was Re: Strangely Variable Query Performance)  (Alvaro Herrera)
List: pgsql-performance

Tree view

Slow Postgresql server  (Jason Lustig, )
 Re: Slow Postgresql server  (Dennis Bjorklund, )
 Re: Slow Postgresql server  (Jeff Frost, )
 Re: Slow Postgresql server  (Ron, )
  Re: Slow Postgresql server  (Guido Neitzer, )
   Re: Slow Postgresql server  (Ron, )
    Re: Slow Postgresql server  (Guido Neitzer, )
     Re: Slow Postgresql server  (Scott Marlowe, )
      Re: Slow Postgresql server  (Jeff Frost, )
       Re: Slow Postgresql server  (Carlos Moreno, )
       Strangely Variable Query Performance  (Steve, )
        Re: Strangely Variable Query Performance  (Tom Lane, )
         Re: Strangely Variable Query Performance  (Steve, )
          Re: Strangely Variable Query Performance  (Tom Lane, )
           Re: Strangely Variable Query Performance  (Steve, )
            Re: Strangely Variable Query Performance  (Tom Lane, )
             Re: Strangely Variable Query Performance  (Steve, )
        Re: Strangely Variable Query Performance  (Scott Marlowe, )
         Re: Strangely Variable Query Performance  (Steve, )
          Re: Strangely Variable Query Performance  (Scott Marlowe, )
           Re: Strangely Variable Query Performance  (Steve, )
           Re: Strangely Variable Query Performance  (Tom Lane, )
            Re: Strangely Variable Query Performance  (Steve, )
             Re: Strangely Variable Query Performance  (Tom Lane, )
              Re: Strangely Variable Query Performance  (Steve, )
             Re: Strangely Variable Query Performance  (Tom Lane, )
              Re: Strangely Variable Query Performance  (Steve, )
               Re: Strangely Variable Query Performance  (Tom Lane, )
                Re: Strangely Variable Query Performance  (Steve, )
                 Re: Strangely Variable Query Performance  (Tom Lane, )
                  Re: Strangely Variable Query Performance  (Steve, )
                   Re: Strangely Variable Query Performance  (Tom Lane, )
                   Re: Strangely Variable Query Performance  (Tom Lane, )
                    Re: Strangely Variable Query Performance  (Steve, )
                     Re: Strangely Variable Query Performance  (Tom Lane, )
                      Fwd: Strangely Variable Query Performance  ("Robins Tharakan", )
                     choose_bitmap_and again (was Re: Strangely Variable Query Performance)  (Tom Lane, )
                      Re: [HACKERS] choose_bitmap_and again (was Re: Strangely Variable Query Performance)  (Alvaro Herrera, )
                       Re: [HACKERS] choose_bitmap_and again (was Re: Strangely Variable Query Performance)  (Tom Lane, )
                       Re: [HACKERS] choose_bitmap_and again (was Re: Strangely Variable Query Performance)  (Tom Lane, )
              Re: Strangely Variable Query Performance  (Steve, )
             Re: [HACKERS] choose_bitmap_and again (was Re: Strangely Variable Query Performance)  (Tom Lane, )
 Re: Slow Postgresql server  (Jeff Frost, )
  Re: Slow Postgresql server  (Jason Lustig, )
   Re: Slow Postgresql server  (Guido Neitzer, )
 Fwd: Strangely Variable Query Performance  (Robins, )
  Re: Fwd: Strangely Variable Query Performance  (Tom Lane, )

Steve <> writes:
> [ strange planner misbehavior in 8.2.3 ]

After some off-list investigation (thanks, Steve, for letting me poke
at your machine), the short answer is that the heuristics used by
choose_bitmap_and() suck.  The problem query is like

select ... from ds where
ds.receipt >= '1998-12-30 0:0:0' and
ds.encounter_id in ( ... 100 distinct values ... );

and the table has a truly remarkable variety of indexes on encounter_id,
receipt, and combinations of them with other columns.  The receipt
condition is actually in effect a no-op, because all receipt dates are
later than that, but because ineq_histogram_selectivity doesn't trust
histogram data unreservedly we compute a selectivity of about 0.99997
for it.  That means that the indexes that cover both receipt and
encounter_id are given a selectivity score just fractionally better than
those involving encounter_id alone, and therefore they sort first in
choose_bitmap_and's sort step, and the way that that routine is coded,
only combinations of the very first index with other ones will be
considered for a bitmap heap scan.  So the possibility of using just the
index on encounter_id alone is never considered, even though that
alternative is vastly cheaper than the alternatives that are considered.
(It happens that encounter_id is a low-order column in all the indexes
that include receipt, and so these scans end up covering the whole index
... multiple times even.  The cost estimation is fine --- the thing
knows these are expensive --- what's falling down is the heuristic for
which combinations of indexes to consider using in a bitmap scan.)

The original coding of choose_bitmap_and involved a "fuzzy" comparison
of selectivities, which would have avoided this problem, but we got rid
of that later because it had its own problems.  In fact,
choose_bitmap_and has caused us enough problems that I'm thinking we
need a fundamental rethink of how it works, rather than just marginal
tweaks.  If you haven't looked at this code before, the comments explain
the idea well enough:

/*
 * choose_bitmap_and
 *        Given a nonempty list of bitmap paths, AND them into one path.
 *
 * This is a nontrivial decision since we can legally use any subset of the
 * given path set.  We want to choose a good tradeoff between selectivity
 * and cost of computing the bitmap.
 *
 * The result is either a single one of the inputs, or a BitmapAndPath
 * combining multiple inputs.
 */
...
    /*
     * In theory we should consider every nonempty subset of the given paths.
     * In practice that seems like overkill, given the crude nature of the
     * estimates, not to mention the possible effects of higher-level AND and
     * OR clauses.  As a compromise, we sort the paths by selectivity.  We
     * always take the first, and sequentially add on paths that result in a
     * lower estimated cost.
     *
     * We also make some effort to detect directly redundant input paths, as
     * can happen if there are multiple possibly usable indexes.  (Another way
     * it can happen is that best_inner_indexscan will find the same OR join
     * clauses that create_or_index_quals has pulled OR restriction clauses
     * out of, and then both versions show up as duplicate paths.)  We
     * consider an index redundant if any of its index conditions were already
     * used by earlier indexes.  (We could use predicate_implied_by to have a
     * more intelligent, but much more expensive, check --- but in most cases
     * simple pointer equality should suffice, since after all the index
     * conditions are all coming from the same RestrictInfo lists.)
     *
     * You might think the condition for redundancy should be "all index
     * conditions already used", not "any", but this turns out to be wrong.
     * For example, if we use an index on A, and then come to an index with
     * conditions on A and B, the only way that the second index can be later
     * in the selectivity-order sort is if the condition on B is completely
     * non-selective.  In any case, we'd surely be drastically misestimating
     * the selectivity if we count the same condition twice.
     *
     * We include index predicate conditions in the redundancy test.  Because
     * the test is just for pointer equality and not equal(), the effect is
     * that use of the same partial index in two different AND elements is
     * considered redundant.  (XXX is this too strong?)
     *
     * Note: outputting the selected sub-paths in selectivity order is a good
     * thing even if we weren't using that as part of the selection method,
     * because it makes the short-circuit case in MultiExecBitmapAnd() more
     * likely to apply.
     */


One idea I thought about was to sort by index scan cost, using
selectivity only as a tiebreaker for cost, rather than the other way
around as is currently done.  This seems fairly plausible because
indexscans that are cheaper than other indexscans likely return fewer
rows too, and so selectivity is already accounted for to some extent ---
at least you can't have an enormously worse selectivity at lower cost,
whereas Steve's example proves it doesn't work the other way.  But I'm
worried about breaking the reasoning about redundant indexes that's
mentioned in the comments.

Another alternative that would respond to the immediate problem is to
maintain the current sort order, but as we come to each index, consider
using that one alone, and throw away whatever AND we might have built up
if that one alone beats the AND-so-far.  This seems more conservative,
as it's unlikely to break any cases that work well now, but on the other
hand it feels like plastering another wart atop a structure that's
already rather rickety.

Has anyone got any thoughts about the best way to do this?

            regards, tom lane


pgsql-performance by date:

From: Alvaro Herrera
Date:
Subject: Re: [HACKERS] choose_bitmap_and again (was Re: Strangely Variable Query Performance)
From: "Simon Riggs"
Date:
Subject: Re: Finding bloated indexes?