Re: BitmapHeapScan streaming read user and prelim refactoring - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: BitmapHeapScan streaming read user and prelim refactoring
Date
Msg-id 6367e86a-7269-4eae-9a35-5a3864854911@enterprisedb.com
Whole thread Raw
In response to Re: BitmapHeapScan streaming read user and prelim refactoring  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
List pgsql-hackers
On 4/6/24 02:51, Tomas Vondra wrote:
> 
> * The one question I'm somewhat unsure about is why Tom chose to use the
> "wrong" recheck flag in the 2017 commit, when the correct recheck flag
> is readily available. Surely that had a reason, right? But I can't think
> of one ...
> 

I've been wondering about this a bit more, so I decided to experiment
and try to construct a case for which the current code prefetches the
wrong blocks, and the patch fixes that. But I haven't been very
successful so far :-(

My understanding was that the current code should do the wrong thing if
I alternate all-visible and not-all-visible pages. This understanding is
not correct, as I learned, because the thing that needs to change is the
recheck flag, not visibility :-( I'm still posting what I tried, perhaps
you will have an idea how to alter it to demonstrate the incorrect
behavior with current master.

The test was very simple:

  create table t (a int, b int) with (fillfactor=10);
  insert into t select mod((i/22),2), (i/22)
    from generate_series(0,1000) s(i);
  create index on t (a);

which creates a table with 46 pages, 22 rows per page, column "a"
alternates between 0/1 on pages, column "b" increments on each page (so
"b" identifies page).

and then

  delete from t where mod(b,8) = 0;

which deletes tuples on pages 0, 8, 16, 24, 32, 40, so these pages will
need to be prefetched as not-all-visible by this query

  explain analyze select count(1) from t where a = 0

when forced to do bitmap heap scan. The other even-numbered pages remain
all-visible. I added a bit of logging into BitmapPrefetch(), but even
with master I get this:

    LOG:  prefetching block 8 0 current block 6 0
    LOG:  prefetching block 16 0 current block 14 0
    LOG:  prefetching block 24 0 current block 22 0
    LOG:  prefetching block 32 0 current block 30 0
    LOG:  prefetching block 40 0 current block 38 0

So it prefetches the correct pages (the other value is the recheck flag
for that block from the iterator result).

Turns out (and I realize the comment about the assumption actually
states that, I just failed to understand it) the thing that would have
to differ for the blocks is the recheck flag.

But that can't actually happen because that's set by the AM/opclass and
the built-in ones do essentially this:

.../hash.c:  scan->xs_recheck = true;
.../nbtree.c:  scan->xs_recheck = false;


gist opclasses (e.g. btree_gist):

    /* All cases served by this function are exact */
    *recheck = false;

spgist opclasses (e.g. geo_spgist):

    /* All tests are exact. */
    out->recheck = false;

If there's an opclass that alters the recheck flag, it's well hidden and
I missed it.

Anyway, after this exercise and learning more about the recheck flag, I
think I agree the assumption is unnecessary. It's pretty harmless
because none of the built-in opclasses alters the recheck flag, but the
correct recheck flag is readily available. I'm still a bit puzzled why
the 2017 commit even relied on this assumption, though.

regards
-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: jian he
Date:
Subject: Re: remaining sql/json patches
Next
From: Tomas Vondra
Date:
Subject: Re: BitmapHeapScan streaming read user and prelim refactoring