Thread: choose_bitmap_and again (was Re: [PERFORM] Strangely Variable Query Performance)

Steve <cheetah@tanabi.org> writes:
> [ strange planner misbehavior in 8.2.3 ]

After some off-list investigation (thanks, Steve, for letting me poke
at your machine), the short answer is that the heuristics used by
choose_bitmap_and() suck.  The problem query is like

select ... from ds where
ds.receipt >= '1998-12-30 0:0:0' and
ds.encounter_id in ( ... 100 distinct values ... );

and the table has a truly remarkable variety of indexes on encounter_id,
receipt, and combinations of them with other columns.  The receipt
condition is actually in effect a no-op, because all receipt dates are
later than that, but because ineq_histogram_selectivity doesn't trust
histogram data unreservedly we compute a selectivity of about 0.99997
for it.  That means that the indexes that cover both receipt and
encounter_id are given a selectivity score just fractionally better than
those involving encounter_id alone, and therefore they sort first in
choose_bitmap_and's sort step, and the way that that routine is coded,
only combinations of the very first index with other ones will be
considered for a bitmap heap scan.  So the possibility of using just the
index on encounter_id alone is never considered, even though that
alternative is vastly cheaper than the alternatives that are considered.
(It happens that encounter_id is a low-order column in all the indexes
that include receipt, and so these scans end up covering the whole index
... multiple times even.  The cost estimation is fine --- the thing
knows these are expensive --- what's falling down is the heuristic for
which combinations of indexes to consider using in a bitmap scan.)

The original coding of choose_bitmap_and involved a "fuzzy" comparison
of selectivities, which would have avoided this problem, but we got rid
of that later because it had its own problems.  In fact,
choose_bitmap_and has caused us enough problems that I'm thinking we
need a fundamental rethink of how it works, rather than just marginal
tweaks.  If you haven't looked at this code before, the comments explain
the idea well enough:

/*
 * choose_bitmap_and
 *        Given a nonempty list of bitmap paths, AND them into one path.
 *
 * This is a nontrivial decision since we can legally use any subset of the
 * given path set.  We want to choose a good tradeoff between selectivity
 * and cost of computing the bitmap.
 *
 * The result is either a single one of the inputs, or a BitmapAndPath
 * combining multiple inputs.
 */
...
    /*
     * In theory we should consider every nonempty subset of the given paths.
     * In practice that seems like overkill, given the crude nature of the
     * estimates, not to mention the possible effects of higher-level AND and
     * OR clauses.  As a compromise, we sort the paths by selectivity.  We
     * always take the first, and sequentially add on paths that result in a
     * lower estimated cost.
     *
     * We also make some effort to detect directly redundant input paths, as
     * can happen if there are multiple possibly usable indexes.  (Another way
     * it can happen is that best_inner_indexscan will find the same OR join
     * clauses that create_or_index_quals has pulled OR restriction clauses
     * out of, and then both versions show up as duplicate paths.)  We
     * consider an index redundant if any of its index conditions were already
     * used by earlier indexes.  (We could use predicate_implied_by to have a
     * more intelligent, but much more expensive, check --- but in most cases
     * simple pointer equality should suffice, since after all the index
     * conditions are all coming from the same RestrictInfo lists.)
     *
     * You might think the condition for redundancy should be "all index
     * conditions already used", not "any", but this turns out to be wrong.
     * For example, if we use an index on A, and then come to an index with
     * conditions on A and B, the only way that the second index can be later
     * in the selectivity-order sort is if the condition on B is completely
     * non-selective.  In any case, we'd surely be drastically misestimating
     * the selectivity if we count the same condition twice.
     *
     * We include index predicate conditions in the redundancy test.  Because
     * the test is just for pointer equality and not equal(), the effect is
     * that use of the same partial index in two different AND elements is
     * considered redundant.  (XXX is this too strong?)
     *
     * Note: outputting the selected sub-paths in selectivity order is a good
     * thing even if we weren't using that as part of the selection method,
     * because it makes the short-circuit case in MultiExecBitmapAnd() more
     * likely to apply.
     */


One idea I thought about was to sort by index scan cost, using
selectivity only as a tiebreaker for cost, rather than the other way
around as is currently done.  This seems fairly plausible because
indexscans that are cheaper than other indexscans likely return fewer
rows too, and so selectivity is already accounted for to some extent ---
at least you can't have an enormously worse selectivity at lower cost,
whereas Steve's example proves it doesn't work the other way.  But I'm
worried about breaking the reasoning about redundant indexes that's
mentioned in the comments.

Another alternative that would respond to the immediate problem is to
maintain the current sort order, but as we come to each index, consider
using that one alone, and throw away whatever AND we might have built up
if that one alone beats the AND-so-far.  This seems more conservative,
as it's unlikely to break any cases that work well now, but on the other
hand it feels like plastering another wart atop a structure that's
already rather rickety.

Has anyone got any thoughts about the best way to do this?

            regards, tom lane

Tom Lane wrote:

> One idea I thought about was to sort by index scan cost, using
> selectivity only as a tiebreaker for cost, rather than the other way
> around as is currently done.  This seems fairly plausible because
> indexscans that are cheaper than other indexscans likely return fewer
> rows too, and so selectivity is already accounted for to some extent ---
> at least you can't have an enormously worse selectivity at lower cost,
> whereas Steve's example proves it doesn't work the other way.  But I'm
> worried about breaking the reasoning about redundant indexes that's
> mentioned in the comments.
>
> Another alternative that would respond to the immediate problem is to
> maintain the current sort order, but as we come to each index, consider
> using that one alone, and throw away whatever AND we might have built up
> if that one alone beats the AND-so-far.  This seems more conservative,
> as it's unlikely to break any cases that work well now, but on the other
> hand it feels like plastering another wart atop a structure that's
> already rather rickety.
>
> Has anyone got any thoughts about the best way to do this?

How about doing both: sort the index by index scan cost; then pick the
first index on the list and start adding indexes when they lower the
cost.  When adding each index, consider it by itself against the
already stacked indexes.  If the cost is lower, put this index at the
top of the list, and restart the algorithm (after the sorting step of
course).

I think the concern about condition redundancy should be attacked
separately.  How about just comparing whether they have common prefixes
of conditions?  I admit I don't understand what would happen with
indexes defined like (lower(A), B, C) versus (A, B) for example.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
> owner@postgresql.org] On Behalf Of Alvaro Herrera
> Sent: Friday, April 13, 2007 4:24 PM
> To: Tom Lane
> Cc: pgsql-hackers@postgreSQL.org; PostgreSQL Performance; Steve
> Subject: Re: [HACKERS] choose_bitmap_and again (was Re: [PERFORM]
> Strangely Variable Query Performance)
>
> Tom Lane wrote:
>
> > One idea I thought about was to sort by index scan cost, using
> > selectivity only as a tiebreaker for cost, rather than the other way
> > around as is currently done.  This seems fairly plausible because
> > indexscans that are cheaper than other indexscans likely return
fewer
> > rows too, and so selectivity is already accounted for to some extent
---
> > at least you can't have an enormously worse selectivity at lower
cost,
> > whereas Steve's example proves it doesn't work the other way.  But
I'm
> > worried about breaking the reasoning about redundant indexes that's
> > mentioned in the comments.
> >
> > Another alternative that would respond to the immediate problem is
to
> > maintain the current sort order, but as we come to each index,
consider
> > using that one alone, and throw away whatever AND we might have
built up
> > if that one alone beats the AND-so-far.  This seems more
conservative,
> > as it's unlikely to break any cases that work well now, but on the
other
> > hand it feels like plastering another wart atop a structure that's
> > already rather rickety.
> >
> > Has anyone got any thoughts about the best way to do this?
>
> How about doing both: sort the index by index scan cost; then pick the
> first index on the list and start adding indexes when they lower the
> cost.  When adding each index, consider it by itself against the
> already stacked indexes.  If the cost is lower, put this index at the
> top of the list, and restart the algorithm (after the sorting step of
> course).
>
> I think the concern about condition redundancy should be attacked
> separately.  How about just comparing whether they have common
prefixes
> of conditions?  I admit I don't understand what would happen with
> indexes defined like (lower(A), B, C) versus (A, B) for example.

Instead of sorting, I suggest the quickselect() algorithm, which is
O(n).
Probably, if the list is small, it won't matter much, but it might offer
some tangible benefit.

Here is an example of the algorithm:

#include <stdlib.h>
typedef double  Etype; /* Season to taste. */

extern Etype    RandomSelect(Etype * A, size_t p, size_t r, size_t i);
extern size_t   RandRange(size_t a, size_t b);
extern size_t   RandomPartition(Etype * A, size_t p, size_t r);
extern size_t   Partition(Etype * A, size_t p, size_t r);

/*
**
** In the following code, every reference to CLR means:
**
**    "Introduction to Algorithms"
**    By Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest
**    ISBN 0-07-013143-0
*/


/*
** CLR, page 187
*/
Etype RandomSelect(Etype A[], size_t p, size_t r, size_t i)
{   size_t          q,                   k;   if (p == r)       return A[p];   q = RandomPartition(A, p, r);   k = q -
p+ 1; 
   if (i <= k)       return RandomSelect(A, p, q, i);   else       return RandomSelect(A, q + 1, r, i - k);
}

size_t RandRange(size_t a, size_t b)
{   size_t c = (size_t) ((double) rand() / ((double) RAND_MAX + 1) * (b
- a));   return c + a;
}

/*
** CLR, page 162
*/
size_t RandomPartition(Etype A[], size_t p, size_t r)
{   size_t          i = RandRange(p, r);   Etype           Temp;   Temp = A[p];   A[p] = A[i];   A[i] = Temp;   return
Partition(A,p, r); 
}

/*
** CLR, page 154
*/
size_t Partition(Etype A[], size_t p, size_t r)
{   Etype           x,                   temp;   size_t          i,                   j;
   x = A[p];   i = p - 1;   j = r + 1;
   for (;;) {       do {           j--;       } while (!(A[j] <= x));       do {           i++;       } while (!(A[i]
>=x));       if (i < j) {           temp = A[i];           A[i] = A[j];           A[j] = temp;       } else
returnj;   } 
}


"Dann Corbit" <DCorbit@connx.com> writes:
> Instead of sorting, I suggest the quickselect() algorithm, which is
> O(n).

What for?  Common cases have less than half a dozen entries.  That is
not the place we need to be spending engineering effort --- what we
need to worry about is what's the choice algorithm, not implementation
details.

            regards, tom lane

Alvaro Herrera <alvherre@commandprompt.com> writes:
> I think the concern about condition redundancy should be attacked
> separately.  How about just comparing whether they have common prefixes
> of conditions?  I admit I don't understand what would happen with
> indexes defined like (lower(A), B, C) versus (A, B) for example.

I understand that issue a bit better than I did when I wrote the comment
(so I suppose I better rewrite it).  The $64 reason for rejecting
AND-combinations of indexes that are using some of the same
WHERE-conditions is that if we don't, we effectively double-count the
selectivity of those conditions, causing us to prefer useless
AND-combinations.  An example is "WHERE A > 42 AND B < 100" where we
have an index on A and one on (A,B).  The selectivity calculation
will blindly assume that the selectivities of the two indexes are
independent and thus prefer to BitmapAnd them, when obviously there
is no point in using both.  Ideally we should improve the selectivity
calculation to not get fooled like this, but that seems hard and
probably slow.  So for the moment we have the heuristic that no
WHERE-clause should be used twice in any AND-combination.

Given that we are using that heuristic, it becomes important that
we visit the indexes in the "right order" --- as the code stands,
in the (A) vs (A,B) case it will consider only the first index it
arrives at in the selectivity sort order, because the second will
be rejected on the basis of re-use of the WHERE A > 42 condition.
Sorting by selectivity tends to ensure that we pick the index that
can make use of as many WHERE-clauses as possible.

The idea of considering each index alone fixes the order dependency
for cases where a single index is the best answer, but it doesn't
help much for cases where you really do want a BitmapAnd, only not
one using the index with the individually best selectivity.

We really need a heuristic here --- exhaustive search will be
impractical in cases where there are many indexes, because you'd
be looking at 2^N combinations.  (In Steve's example there are
actually 38 relevant indexes, which is bad database design if
you ask me, but in any case we cannot afford to search through
2^38 possibilities.)  But the one we're using now is too fragile.

Maybe we should use a cutoff similar to the GEQO one: do exhaustive
search if there are less than N relevant indexes, for some N.
But that's not going to help Steve; we still need a smarter heuristic
for what to look for above the cutoff.

            regards, tom lane

Alvaro Herrera <alvherre@commandprompt.com> writes:
> Tom Lane wrote:
>> Has anyone got any thoughts about the best way to do this?

> How about doing both: sort the index by index scan cost; then pick the
> first index on the list and start adding indexes when they lower the
> cost.  When adding each index, consider it by itself against the
> already stacked indexes.  If the cost is lower, put this index at the
> top of the list, and restart the algorithm (after the sorting step of
> course).

The "restart" part of that bothers me --- it's not entirely clear that
you couldn't get into an infinite loop.  (Imagine that A+B is better
than A alone, so we adopt it, but worse than C alone, so we take C as
the new leader and start over.  Then perhaps C+B is better than C alone
but worse than A alone, so we take A as the leader and start over.
Maybe this is impossible but I'm unsure.)

I looked back at the gdb results I'd gotten from Steve's example and
noticed that for his 38 indexes there were only three distinct index
selectivity values, and the sort step grouped indexes by cost within
those groups.  In hindsight of course this is obvious: the selectivity
depends on the set of WHERE-clauses used, so with two WHERE-clauses
there are three possible selectivities (to within roundoff error anyway)
depending on whether the index uses one or both clauses.  So the
existing algorithm gets one thing right: for any two indexes that make
use of just the same set of WHERE-clauses, it will always take the one
with cheaper scan cost.

Thinking more about this leads me to the following proposal:

1. Explicitly group the indexes according to the subset of
WHERE-conditions (and partial index conditions, if any) they use.
Within each such group, discard all but the cheapest-scan-cost one.

2. Sort the remaining indexes according to scan cost.

3. For each index in order, consider it as a standalone scan, and also
consider adding it on to the AND-group led by each preceding index,
using the same logic as now: reject using any WHERE-condition twice
in a group, and then add on only if the total cost of the AND-group
scan is reduced.

This would be approximately O(N^2) in the number of relevant indexes,
whereas the current scheme is roughly linear (handwaving a bit here
because the number of WHERE-clauses is a factor too).  But that seems
unlikely to be a problem for sane numbers of indexes, unlike the O(2^N)
behavior of an exhaustive search.  It would get rid of (most of) the
order-sensitivity problem, since we would definitely consider the
AND-combination of every pair of combinable indexes.  I can imagine
cases where it wouldn't notice useful three-or-more-way combinations
because the preceding two-way combination didn't win, but they seem
pretty remote possibilities.

Comments?

            regards, tom lane

On Fri, 2007-04-13 at 18:48 -0400, Tom Lane wrote:
> Has anyone got any thoughts about the best way to do this?

I don't think we know enough to pick one variant that works in all
cases. This requires more detailed analysis of various cases.

Lets put in a parameter to allow the options to be varied. The purpose
would be to look for some more information that allows us to see what
the pre-conditions would be for each heuristic.

Initially, we say this is a beta-only feature and may be withdrawn in
the production version.

Why did it not pick my index? is a tiring game, but one that must be
played. I hope that the ORDER LIMIT optimization should reduce the
number of indexes chosen to maintain sort order in the output.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com