Thread: [HACKERS] partial indexes and bitmap scans

[HACKERS] partial indexes and bitmap scans

From
Stephen Frost
Date:
Greetings,

Consider this:

create table t10 (c1 int, c2 int);
create index on t10 (c1) where c2 > 5;

\d t10     Table "sfrost.t10"Column |  Type   | Modifiers
--------+---------+-----------c1     | integer | c2     | integer |
Indexes:   "t10_c1_idx" btree (c1) WHERE c2 > 5

insert into t10 select * from generate_series(1,10000) a, generate_series(1,10) b;
(repeat a bunch of times, if desired)

vacuum analyze t10;
set work_mem = '64kB';
set enable_indexscan = false;
set enable_seqscan = false;

=*> explain analyze select * from t10 where c2 > 6;                                                           QUERY
PLAN                                                            

----------------------------------------------------------------------------------------------------------------------------------Bitmap
HeapScan on t10  (cost=6496.49..15037.50 rows=318653 width=8) (actual time=34.682..116.236 rows=320000 loops=1)
RecheckCond: (c2 > 5)  Rows Removed by Index Recheck: 327502  Filter: (c2 > 6)  Rows Removed by Filter: 80000  Heap
Blocks:exact=642 lossy=2898  ->  Bitmap Index Scan on t10_c1_idx  (cost=0.00..6416.83 rows=400081 width=0) (actual
time=34.601..34.601rows=400000 loops=1)Planning time: 0.087 msExecution time: 124.229 ms 
(9 rows)

Perhaps I'm missing something obvious, but isn't it a bit redundant to
have both a Recheck condition (which is the predicate of the index) and
a Filter condition (which is the user's predicate) when we've already
decided that the user's predicate must result in a subset of the
index's, as, otherwise, we wouldn't be able to use the index in the
first place?

In other words, it seems like we shouldn't need a Filter in the above
Bitmap Heap Scan, instead we should just make the Recheck be (c2 > 6).

I've not looked into the code side of this at all and there may be
reasons why this is hard to do, but it seems like a worthwhile
improvement to consider doing, though perhaps I'm missing some reason
why we need both the Recheck and the Filter in such cases for
correctness.

Thanks!

Stephen

Re: [HACKERS] partial indexes and bitmap scans

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
> Perhaps I'm missing something obvious, but isn't it a bit redundant to
> have both a Recheck condition (which is the predicate of the index) and
> a Filter condition (which is the user's predicate) when we've already
> decided that the user's predicate must result in a subset of the
> index's, as, otherwise, we wouldn't be able to use the index in the
> first place?

Yeah, I think this is just something that the planner doesn't see fit
to expend cycles on detecting.
        regards, tom lane



Re: [HACKERS] partial indexes and bitmap scans

From
Stephen Frost
Date:
Tom,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > Perhaps I'm missing something obvious, but isn't it a bit redundant to
> > have both a Recheck condition (which is the predicate of the index) and
> > a Filter condition (which is the user's predicate) when we've already
> > decided that the user's predicate must result in a subset of the
> > index's, as, otherwise, we wouldn't be able to use the index in the
> > first place?
>
> Yeah, I think this is just something that the planner doesn't see fit
> to expend cycles on detecting.

We have already figured out that the user's predicate results in a
subset of the index's or we wouldn't be able to use that index though,
right?  Do we really need to spend cycles re-discovering that?  Are
there cases where we actually need the index's predicate to ever be
included for correctness..?

This seems like we're going out of our way to add in an additional check
for something that we've already determined must always be true and that
strikes me as odd.

Thanks!

Stephen

Re: [HACKERS] partial indexes and bitmap scans

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
> We have already figured out that the user's predicate results in a
> subset of the index's or we wouldn't be able to use that index though,
> right?  Do we really need to spend cycles re-discovering that?  Are
> there cases where we actually need the index's predicate to ever be
> included for correctness..?

In a bitmap indexscan, yes, because the entire point of the recheck
condition is that we're going to scan a whole page and possibly see
tuples that don't satisfy the index predicate at all.  Another point
that's mentioned in the comments in createplan.c is that if you're
considering the result of a BitmapOr or BitmapAnd, there's not necessarily
a single index involved so it's much harder to reason about which part
of the qual is an index predicate.

I do notice that createplan.c makes some effort to get rid of filter
conditions that are provably true when the index conditions are.
Doesn't look like it considers the reverse direction.  Not sure if
that's missing a bet.
        regards, tom lane



Re: [HACKERS] partial indexes and bitmap scans

From
Stephen Frost
Date:
Tom,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > We have already figured out that the user's predicate results in a
> > subset of the index's or we wouldn't be able to use that index though,
> > right?  Do we really need to spend cycles re-discovering that?  Are
> > there cases where we actually need the index's predicate to ever be
> > included for correctness..?
>
> In a bitmap indexscan, yes, because the entire point of the recheck
> condition is that we're going to scan a whole page and possibly see
> tuples that don't satisfy the index predicate at all.

I understand the point of the recheck condition being that we might see
tuples that don't satisfy either the index predicate or the user's
predicate, but that isn't the point- to even consider using that index
we must have already realized that the user's predicate will only be
satisfied in a subset of cases when the index predicate predicate will
be satisfied, making any check of the index predicate necessairly always
true.

> Another point
> that's mentioned in the comments in createplan.c is that if you're
> considering the result of a BitmapOr or BitmapAnd, there's not necessarily
> a single index involved so it's much harder to reason about which part
> of the qual is an index predicate.

We must be pulling the index's predicate to be able to put it into the
Recheck condition in the first place.  What I'm arguing is that once
we've decided that we're able to use an index X, because the values
which the user's predicate will satisfy is a subset of those which
the index's predicate will satisfy, then we can entirely forget about
the index's predicate as being redundant to the user's.

I don't see anything about a BitmapAnd or BitmapOr being relevant to
that.  We will always need to check the user's predicate against the
tuples being returned from the Bitmap Heap Scan, unless by chance it
matches the index's predicate exactly *and* we have an entirely exact
match bitmap without any lossy parts.

> I do notice that createplan.c makes some effort to get rid of filter
> conditions that are provably true when the index conditions are.
> Doesn't look like it considers the reverse direction.  Not sure if
> that's missing a bet.

That actually strikes me as a less likely condition to have in the
general case, isn't it?  Wouldn't that only happen if the index
predicate and the user predicate are identical, because otherwise we
either can't use the index or we must keep the user's predicate because
it will only be satisfied in a subset of cases?

Thanks!

Stephen

Re: [HACKERS] partial indexes and bitmap scans

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> I do notice that createplan.c makes some effort to get rid of filter
>> conditions that are provably true when the index conditions are.
>> Doesn't look like it considers the reverse direction.  Not sure if
>> that's missing a bet.

> That actually strikes me as a less likely condition to have in the
> general case, isn't it?  Wouldn't that only happen if the index
> predicate and the user predicate are identical, because otherwise we
> either can't use the index or we must keep the user's predicate because
> it will only be satisfied in a subset of cases?

Well, remember that the planner's idea of an ideal situation is to not
have any filter conditions, not to not have any index (a/k/a recheck)
conditions.  It's going to try to put as much load as it can on the index
condition side of things, and that gives rise to the need for rechecks.

It seems like there might be some mileage to be gained by reversing the
proof direction here, and having it get rid of recheck conditions that are
implied by filter conditions rather than vice versa.  I'm not quite
convinced though, and I'm also not sure how hard it would be to mechanize.
A lot of that code is shared between the bitmap and plain-indexscan cases,
which could make it tricky to not break the plain-indexscan case.
        regards, tom lane