Re: BitmapHeapScan streaming read user and prelim refactoring - Mailing list pgsql-hackers

From Andres Freund
Subject Re: BitmapHeapScan streaming read user and prelim refactoring
Date
Msg-id 20240214194228.jf4o7luxtyra5ggv@awork3.anarazel.de
Whole thread Raw
In response to BitmapHeapScan streaming read user and prelim refactoring  (Melanie Plageman <melanieplageman@gmail.com>)
Responses Re: BitmapHeapScan streaming read user and prelim refactoring
Re: BitmapHeapScan streaming read user and prelim refactoring
List pgsql-hackers
Hi,

On 2024-02-13 18:11:25 -0500, Melanie Plageman wrote:
> Attached is a patch set which refactors BitmapHeapScan such that it
> can use the streaming read API [1]. It also resolves the long-standing
> FIXME in the BitmapHeapScan code suggesting that the skip fetch
> optimization should be pushed into the table AMs. Additionally, it
> moves table scan initialization to after the index scan and bitmap
> initialization.

Thanks for working on this! While I have some quibbles with details, I think
this is quite a bit of progress in the right direction.


> patches 0001-0002 are assorted cleanup needed later in the set.
> patches 0003 moves the table scan initialization to after bitmap creation
> patch 0004 is, I think, a bug fix. see [2].

I'd not quite call it a bugfix, it's not like it leads to wrong
behaviour. Seems more like an optimization. But whatever :)



> The caveat is that these patches introduce breaking changes to two
> table AM functions for bitmapheapscan: table_scan_bitmap_next_block()
> and table_scan_bitmap_next_tuple().

That's to be expected, I don't think it's worth worrying about. Right now a
bunch of TAMs can't implement bitmap scans, this goes a fair bit towards
allowing that...





> From d6dd6eb21dcfbc41208f87d1d81ffe3960130889 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman@gmail.com>
> Date: Mon, 12 Feb 2024 18:50:29 -0500
> Subject: [PATCH v1 03/11] BitmapHeapScan begin scan after bitmap setup
>
> There is no reason for table_beginscan_bm() to begin the actual scan of
> the underlying table in ExecInitBitmapHeapScan(). We can begin the
> underlying table scan after the index scan has been completed and the
> bitmap built.
>
> The one use of the scan descriptor during initialization was
> ExecBitmapHeapInitializeWorker(), which set the scan descriptor snapshot
> with one from an array in the parallel state. This overwrote the
> snapshot set in table_beginscan_bm().
>
> By saving that worker snapshot as a member in the BitmapHeapScanState
> during initialization, it can be restored in table_beginscan_bm() after
> returning from the table AM specific begin scan function.

I don't understand what the point of passing two different snapshots to
table_beginscan_bm() is. What does that even mean? Why can't we just use the
correct snapshot initially?


> From a3f62e4299663d418531ae61bb16ea39f0836fac Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman@gmail.com>
> Date: Mon, 12 Feb 2024 19:03:24 -0500
> Subject: [PATCH v1 04/11] BitmapPrefetch use prefetch block recheck for skip
>  fetch
>
> Previously BitmapPrefetch() used the recheck flag for the current block
> to determine whether or not it could skip prefetching the proposed
> prefetch block. It makes more sense for it to use the recheck flag from
> the TBMIterateResult for the prefetch block instead.

I'd mention the commit that introduced the current logic and link to the
the thread that you started about this.


> From d56be7741765d93002649ef912ef4b8256a5b9af Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman@gmail.com>
> Date: Mon, 12 Feb 2024 19:04:48 -0500
> Subject: [PATCH v1 05/11] Update BitmapAdjustPrefetchIterator parameter type
>  to BlockNumber
>
> BitmapAdjustPrefetchIterator() only used the blockno member of the
> passed in TBMIterateResult to ensure that the prefetch iterator and
> regular iterator stay in sync. Pass it the BlockNumber only. This will
> allow us to move away from using the TBMIterateResult outside of table
> AM specific code.

Hm - I'm not convinced this is a good direction - doesn't that arguably
*increase* TAM awareness? Perhaps it doesn't make much sense to use bitmap
heap scans in a TAM without blocks, but still.



> From 202b16d3a381210e8dbee69e68a8310be8ee11d2 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman@gmail.com>
> Date: Mon, 12 Feb 2024 20:15:05 -0500
> Subject: [PATCH v1 06/11] Push BitmapHeapScan skip fetch optimization into
>  table AM
>
> This resolves the long-standing FIXME in BitmapHeapNext() which said that
> the optmization to skip fetching blocks of the underlying table when
> none of the column data was needed should be pushed into the table AM
> specific code.

Long-standing? Sure, it's old enough to walk, but we have FIXMEs that are old
enough to drink, at least in some countries. :)


> The table AM agnostic functions for prefetching still need to know if
> skipping fetching is permitted for this scan. However, this dependency
> will be removed when that prefetching code is removed in favor of the
> upcoming streaming read API.

> ---
>  src/backend/access/heap/heapam.c          |  10 +++
>  src/backend/access/heap/heapam_handler.c  |  29 +++++++
>  src/backend/executor/nodeBitmapHeapscan.c | 100 ++++++----------------
>  src/include/access/heapam.h               |   2 +
>  src/include/access/tableam.h              |  17 ++--
>  src/include/nodes/execnodes.h             |   6 --
>  6 files changed, 74 insertions(+), 90 deletions(-)
>
> diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
> index 707460a5364..7aae1ecf0a9 100644
> --- a/src/backend/access/heap/heapam.c
> +++ b/src/backend/access/heap/heapam.c
> @@ -955,6 +955,8 @@ heap_beginscan(Relation relation, Snapshot snapshot,
>      scan->rs_base.rs_flags = flags;
>      scan->rs_base.rs_parallel = parallel_scan;
>      scan->rs_strategy = NULL;    /* set in initscan */
> +    scan->vmbuffer = InvalidBuffer;
> +    scan->empty_tuples = 0;

These don't follow the existing naming pattern for HeapScanDescData. While I
explicitly dislike the practice of adding prefixes to struct members, I don't
think mixing conventions within a single struct improves things.

I also think it'd be good to note in comments that the vm buffer currently is
only used for bitmap heap scans, otherwise one might think they'd also be used
for normal scans, where we don't need them, because of the page level flag.

Also, perhaps worth renaming "empty_tuples" to something indicating that it's
the number of empty tuples to be returned later? num_empty_tuples_pending or
such? Or the current "return_empty_tuples".


> @@ -1043,6 +1045,10 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
>      if (BufferIsValid(scan->rs_cbuf))
>          ReleaseBuffer(scan->rs_cbuf);
>
> +    if (BufferIsValid(scan->vmbuffer))
> +        ReleaseBuffer(scan->vmbuffer);
> +    scan->vmbuffer = InvalidBuffer;

It does not matter one iota here, but personally I prefer moving the write
inside the if, as dirtying the cacheline after we just figured out whe



> diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
> index 9372b49bfaa..c0fb06c9688 100644
> --- a/src/backend/executor/nodeBitmapHeapscan.c
> +++ b/src/backend/executor/nodeBitmapHeapscan.c
> @@ -108,6 +108,7 @@ BitmapHeapNext(BitmapHeapScanState *node)
>       */
>      if (!node->initialized)
>      {
> +        bool can_skip_fetch;
>          /*
>           * We can potentially skip fetching heap pages if we do not need any
>           * columns of the table, either for checking non-indexable quals or

Pretty sure pgindent will move this around.

> +++ b/src/include/access/tableam.h
> @@ -62,6 +62,7 @@ typedef enum ScanOptions
>
>      /* unregister snapshot at scan end? */
>      SO_TEMP_SNAPSHOT = 1 << 9,
> +    SO_CAN_SKIP_FETCH = 1 << 10,
>  }            ScanOptions;

Would be nice to add a comment explaining what this flag means.


> From 500c84019b982a1e6c8b8dd40240c8510d83c287 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman@gmail.com>
> Date: Tue, 13 Feb 2024 10:05:04 -0500
> Subject: [PATCH v1 07/11] BitmapHeapScan scan desc counts lossy and exact
>  pages
>
> Future commits will remove the TBMIterateResult from BitmapHeapNext(),
> pushing it into the table AM-specific code. So we will have to keep
> track of the number of lossy and exact pages in the scan descriptor.
> Doing this change to lossy/exact page counting in a separate commit just
> simplifies the diff.

> ---
>  src/backend/access/heap/heapam.c          |  2 ++
>  src/backend/access/heap/heapam_handler.c  |  9 +++++++++
>  src/backend/executor/nodeBitmapHeapscan.c | 18 +++++++++++++-----
>  src/include/access/relscan.h              |  4 ++++
>  4 files changed, 28 insertions(+), 5 deletions(-)
>
> diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
> index 7aae1ecf0a9..88b4aad5820 100644
> --- a/src/backend/access/heap/heapam.c
> +++ b/src/backend/access/heap/heapam.c
> @@ -957,6 +957,8 @@ heap_beginscan(Relation relation, Snapshot snapshot,
>      scan->rs_strategy = NULL;    /* set in initscan */
>      scan->vmbuffer = InvalidBuffer;
>      scan->empty_tuples = 0;
> +    scan->rs_base.lossy_pages = 0;
> +    scan->rs_base.exact_pages = 0;
>
>      /*
>       * Disable page-at-a-time mode if it's not a MVCC-safe snapshot.
> diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
> index baba09c87c0..6e85ef7a946 100644
> --- a/src/backend/access/heap/heapam_handler.c
> +++ b/src/backend/access/heap/heapam_handler.c
> @@ -2242,6 +2242,15 @@ heapam_scan_bitmap_next_block(TableScanDesc scan,
>      Assert(ntup <= MaxHeapTuplesPerPage);
>      hscan->rs_ntuples = ntup;
>
> +    /* Only count exact and lossy pages with visible tuples */
> +    if (ntup > 0)
> +    {
> +        if (tbmres->ntuples >= 0)
> +            scan->exact_pages++;
> +        else
> +            scan->lossy_pages++;
> +    }
> +
>      return ntup > 0;
>  }
>
> diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
> index c0fb06c9688..19d115de06f 100644
> --- a/src/backend/executor/nodeBitmapHeapscan.c
> +++ b/src/backend/executor/nodeBitmapHeapscan.c
> @@ -53,6 +53,8 @@
>  #include "utils/spccache.h"
>
>  static TupleTableSlot *BitmapHeapNext(BitmapHeapScanState *node);
> +static inline void BitmapAccumCounters(BitmapHeapScanState *node,
> +                                       TableScanDesc scan);
>  static inline void BitmapDoneInitializingSharedState(ParallelBitmapHeapState *pstate);
>  static inline void BitmapAdjustPrefetchIterator(BitmapHeapScanState *node,
>                                                  BlockNumber blockno);
> @@ -234,11 +236,6 @@ BitmapHeapNext(BitmapHeapScanState *node)
>                  continue;
>              }
>
> -            if (tbmres->ntuples >= 0)
> -                node->exact_pages++;
> -            else
> -                node->lossy_pages++;
> -
>              /* Adjust the prefetch target */
>              BitmapAdjustPrefetchTarget(node);
>          }
> @@ -315,9 +312,20 @@ BitmapHeapNext(BitmapHeapScanState *node)
>      /*
>       * if we get here it means we are at the end of the scan..
>       */
> +    BitmapAccumCounters(node, scan);
>      return ExecClearTuple(slot);
>  }
>
> +static inline void
> +BitmapAccumCounters(BitmapHeapScanState *node,
> +                    TableScanDesc scan)
> +{
> +    node->exact_pages += scan->exact_pages;
> +    scan->exact_pages = 0;
> +    node->lossy_pages += scan->lossy_pages;
> +    scan->lossy_pages = 0;
> +}
> +

I don't think this is quite right - you're calling BitmapAccumCounters() only
when the scan doesn't return anything anymore, but there's no guarantee
that'll ever be reached. E.g. a bitmap heap scan below a limit node.  I think
this needs to be in a) ExecEndBitmapHeapScan() b) ExecReScanBitmapHeapScan()


>  /*
>   *    BitmapDoneInitializingSharedState - Shared state is initialized
>   *
> diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
> index 521043304ab..b74e08dd745 100644
> --- a/src/include/access/relscan.h
> +++ b/src/include/access/relscan.h
> @@ -40,6 +40,10 @@ typedef struct TableScanDescData
>      ItemPointerData rs_mintid;
>      ItemPointerData rs_maxtid;
>
> +    /* Only used for Bitmap table scans */
> +    long        exact_pages;
> +    long        lossy_pages;
> +
>      /*
>       * Information about type and behaviour of the scan, a bitmask of members
>       * of the ScanOptions enum (see tableam.h).

I wonder if this really is the best place for the data to be accumulated. This
requires the accounting to be implemented in each AM, which doesn't obviously
seem required.  Why can't the accounting continue to live in
nodeBitmapHeapscan.c, to be done after each table_scan_bitmap_next_block()
call?


> From 555743e4bc885609d20768f7f2990c6ba69b13a9 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman@gmail.com>
> Date: Tue, 13 Feb 2024 10:57:07 -0500
> Subject: [PATCH v1 09/11] Make table_scan_bitmap_next_block() async friendly
>
> table_scan_bitmap_next_block() previously returned false if we did not
> wish to call table_scan_bitmap_next_tuple() on the tuples on the page.
> This could happen when there were no visible tuples on the page or, due
> to concurrent activity on the table, the block returned by the iterator
> is past the known end of the table.

This sounds a bit like the block is actually past the end of the table,
but in reality this happens if the block is past the end of the table as it
was when the scan was started. Somehow that feels significant, but I don't
really know why I think that.



> diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
> index 88b4aad5820..d8569373987 100644
> --- a/src/backend/access/heap/heapam.c
> +++ b/src/backend/access/heap/heapam.c
> @@ -959,6 +959,8 @@ heap_beginscan(Relation relation, Snapshot snapshot,
>      scan->empty_tuples = 0;
>      scan->rs_base.lossy_pages = 0;
>      scan->rs_base.exact_pages = 0;
> +    scan->rs_base.shared_tbmiterator = NULL;
> +    scan->rs_base.tbmiterator = NULL;
>
>      /*
>       * Disable page-at-a-time mode if it's not a MVCC-safe snapshot.
> @@ -1051,6 +1053,18 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
>          ReleaseBuffer(scan->vmbuffer);
>      scan->vmbuffer = InvalidBuffer;
>
> +    if (scan->rs_base.rs_flags & SO_TYPE_BITMAPSCAN)
> +    {
> +        if (scan->rs_base.shared_tbmiterator)
> +            tbm_end_shared_iterate(scan->rs_base.shared_tbmiterator);
> +
> +        if (scan->rs_base.tbmiterator)
> +            tbm_end_iterate(scan->rs_base.tbmiterator);
> +    }
> +
> +    scan->rs_base.shared_tbmiterator = NULL;
> +    scan->rs_base.tbmiterator = NULL;
> +
>      /*
>       * reinitialize scan descriptor
>       */

If every AM would need to implement this, perhaps this shouldn't be done here,
but in generic code?


> --- a/src/backend/access/heap/heapam_handler.c
> +++ b/src/backend/access/heap/heapam_handler.c
> @@ -2114,17 +2114,49 @@ heapam_estimate_rel_size(Relation rel, int32 *attr_widths,
>  
>  static bool
>  heapam_scan_bitmap_next_block(TableScanDesc scan,
> -                              TBMIterateResult *tbmres)
> +                              bool *recheck, BlockNumber *blockno)
>  {
>      HeapScanDesc hscan = (HeapScanDesc) scan;
> -    BlockNumber block = tbmres->blockno;
> +    BlockNumber block;
>      Buffer        buffer;
>      Snapshot    snapshot;
>      int            ntup;
> +    TBMIterateResult *tbmres;
>  
>      hscan->rs_cindex = 0;
>      hscan->rs_ntuples = 0;
>  
> +    *blockno = InvalidBlockNumber;
> +    *recheck = true;
> +
> +    do
> +    {
> +        if (scan->shared_tbmiterator)
> +            tbmres = tbm_shared_iterate(scan->shared_tbmiterator);
> +        else
> +            tbmres = tbm_iterate(scan->tbmiterator);
> +
> +        if (tbmres == NULL)
> +        {
> +            /* no more entries in the bitmap */
> +            Assert(hscan->empty_tuples == 0);
> +            return false;
> +        }
> +
> +        /*
> +         * Ignore any claimed entries past what we think is the end of the
> +         * relation. It may have been extended after the start of our scan (we
> +         * only hold an AccessShareLock, and it could be inserts from this
> +         * backend).  We don't take this optimization in SERIALIZABLE
> +         * isolation though, as we need to examine all invisible tuples
> +         * reachable by the index.
> +         */
> +    } while (!IsolationIsSerializable() && tbmres->blockno >= hscan->rs_nblocks);

Hm. Isn't it a problem that we have no CHECK_FOR_INTERRUPTS() in this loop?


> @@ -2251,7 +2274,14 @@ heapam_scan_bitmap_next_block(TableScanDesc scan,
>              scan->lossy_pages++;
>      }
>
> -    return ntup > 0;
> +    /*
> +     * Return true to indicate that a valid block was found and the bitmap is
> +     * not exhausted. If there are no visible tuples on this page,
> +     * hscan->rs_ntuples will be 0 and heapam_scan_bitmap_next_tuple() will
> +     * return false returning control to this function to advance to the next
> +     * block in the bitmap.
> +     */
> +    return true;
>  }

Why can't we fetch the next block immediately?



> @@ -201,46 +197,23 @@ BitmapHeapNext(BitmapHeapScanState *node)
>                                                                      can_skip_fetch);
>          }
>
> -        node->tbmiterator = tbmiterator;
> -        node->shared_tbmiterator = shared_tbmiterator;
> +        scan->tbmiterator = tbmiterator;
> +        scan->shared_tbmiterator = shared_tbmiterator;

It seems a bit odd that this code modifies the scan descriptor, instead of
passing the iterator, or perhaps better the bitmap itself, to
table_beginscan_bm()?



> diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
> index b74e08dd745..bf7ee044268 100644
> --- a/src/include/access/relscan.h
> +++ b/src/include/access/relscan.h
> @@ -16,6 +16,7 @@
>
>  #include "access/htup_details.h"
>  #include "access/itup.h"
> +#include "nodes/tidbitmap.h"

I'd like to avoid exposing this to everything including relscan.h. I think we
could just forward declare the structs and use them here to avoid that?





> From aac60985d6bc70bfedf77a77ee3c512da87bfcb1 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman@gmail.com>
> Date: Tue, 13 Feb 2024 14:27:57 -0500
> Subject: [PATCH v1 11/11] BitmapHeapScan uses streaming read API
>
> Remove all of the code to do prefetching from BitmapHeapScan code and
> rely on the streaming read API prefetching. Heap table AM implements a
> streaming read callback which uses the iterator to get the next valid
> block that needs to be fetched for the streaming read API.
> ---
>  src/backend/access/gin/ginget.c           |  15 +-
>  src/backend/access/gin/ginscan.c          |   7 +
>  src/backend/access/heap/heapam.c          |  71 +++++
>  src/backend/access/heap/heapam_handler.c  |  78 +++--
>  src/backend/executor/nodeBitmapHeapscan.c | 328 +---------------------
>  src/backend/nodes/tidbitmap.c             |  80 +++---
>  src/include/access/heapam.h               |   2 +
>  src/include/access/tableam.h              |  14 +-
>  src/include/nodes/execnodes.h             |  19 --
>  src/include/nodes/tidbitmap.h             |   8 +-
>  10 files changed, 178 insertions(+), 444 deletions(-)
>
> diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c
> index 0b4f2ebadb6..3ce28078a6f 100644
> --- a/src/backend/access/gin/ginget.c
> +++ b/src/backend/access/gin/ginget.c
> @@ -373,7 +373,10 @@ restartScanEntry:
>              if (entry->matchBitmap)
>              {
>                  if (entry->matchIterator)
> +                {
>                      tbm_end_iterate(entry->matchIterator);
> +                    pfree(entry->matchResult);
> +                }
>                  entry->matchIterator = NULL;
>                  tbm_free(entry->matchBitmap);
>                  entry->matchBitmap = NULL;
> @@ -386,6 +389,7 @@ restartScanEntry:
>          if (entry->matchBitmap && !tbm_is_empty(entry->matchBitmap))
>          {
>              entry->matchIterator = tbm_begin_iterate(entry->matchBitmap);
> +            entry->matchResult = palloc0(TBM_ITERATE_RESULT_SIZE);

Do we actually have to use palloc0? TBM_ITERATE_RESULT_SIZE ain't small, so
zeroing all of it isn't free.


> +static BlockNumber bitmapheap_pgsr_next_single(PgStreamingRead *pgsr, void *pgsr_private,
> +                            void *per_buffer_data);

Is it correct to have _single in the name here? Aren't we also using for
parallel scans?


> +static BlockNumber
> +bitmapheap_pgsr_next_single(PgStreamingRead *pgsr, void *pgsr_private,
> +                            void *per_buffer_data)
> +{
> +    TBMIterateResult *tbmres = per_buffer_data;
> +    HeapScanDesc hdesc = (HeapScanDesc) pgsr_private;
> +
> +    for (;;)
> +    {
> +        if (hdesc->rs_base.shared_tbmiterator)
> +            tbm_shared_iterate(hdesc->rs_base.shared_tbmiterator, tbmres);
> +        else
> +            tbm_iterate(hdesc->rs_base.tbmiterator, tbmres);
> +
> +        /* no more entries in the bitmap */
> +        if (!BlockNumberIsValid(tbmres->blockno))
> +            return InvalidBlockNumber;
> +
> +        /*
> +         * Ignore any claimed entries past what we think is the end of the
> +         * relation. It may have been extended after the start of our scan (we
> +         * only hold an AccessShareLock, and it could be inserts from this
> +         * backend).  We don't take this optimization in SERIALIZABLE
> +         * isolation though, as we need to examine all invisible tuples
> +         * reachable by the index.
> +         */
> +        if (!IsolationIsSerializable() && tbmres->blockno >= hdesc->rs_nblocks)
> +            continue;
> +
> +
> +        if (hdesc->rs_base.rs_flags & SO_CAN_SKIP_FETCH &&
> +            !tbmres->recheck &&
> +            VM_ALL_VISIBLE(hdesc->rs_base.rs_rd, tbmres->blockno, &hdesc->vmbuffer))
> +        {
> +            hdesc->empty_tuples += tbmres->ntuples;
> +            continue;
> +        }
> +
> +        return tbmres->blockno;
> +    }
> +
> +    /* not reachable */
> +    Assert(false);
> +}

Need to check for interrupts somewhere here.


> @@ -124,15 +119,6 @@ BitmapHeapNext(BitmapHeapScanState *node)

There's still a comment in BitmapHeapNext talking about prefetching with two
iterators etc. That seems outdated now.


>  /*
>   * tbm_iterate - scan through next page of a TIDBitmap
>   *
> - * Returns a TBMIterateResult representing one page, or NULL if there are
> - * no more pages to scan.  Pages are guaranteed to be delivered in numerical
> - * order.  If result->ntuples < 0, then the bitmap is "lossy" and failed to
> - * remember the exact tuples to look at on this page --- the caller must
> - * examine all tuples on the page and check if they meet the intended
> - * condition.  If result->recheck is true, only the indicated tuples need
> - * be examined, but the condition must be rechecked anyway.  (For ease of
> - * testing, recheck is always set true when ntuples < 0.)
> + * Caller must pass in a TBMIterateResult to be filled.
> + *
> + * Pages are guaranteed to be delivered in numerical order.  tbmres->blockno is
> + * set to InvalidBlockNumber when there are no more pages to scan. If
> + * tbmres->ntuples < 0, then the bitmap is "lossy" and failed to remember the
> + * exact tuples to look at on this page --- the caller must examine all tuples
> + * on the page and check if they meet the intended condition.  If
> + * tbmres->recheck is true, only the indicated tuples need be examined, but the
> + * condition must be rechecked anyway.  (For ease of testing, recheck is always
> + * set true when ntuples < 0.)
>   */
> -TBMIterateResult *
> -tbm_iterate(TBMIterator *iterator)
> +void
> +tbm_iterate(TBMIterator *iterator, TBMIterateResult *tbmres)

Hm - it seems a tad odd that we later have to find out if the scan is done
iterating by checking if blockno is valid, when tbm_iterate already knew. But
I guess the code would be a bit uglier if we needed the result of
tbm_[shared_]iterate(), due to the two functions.


Right now ExecEndBitmapHeapScan() frees the tbm before it does table_endscan()
- which seems problematic, as heap_endscan() will do stuff like
tbm_end_iterate(), which imo shouldn't be called after the tbm has been freed,
even if that works today.


It seems a bit confusing that your changes seem to treat
BitmapHeapScanState->initialized as separate from ->scan, even though afaict
scan should be NULL iff initialized is false and vice versa.


Independent of your patches, but brr, it's ugly that
BitmapShouldInitializeSharedState() blocks.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: postgres_fdw fails to see that array type belongs to extension
Next
From: Noah Misch
Date:
Subject: Re: common signal handler protection