Re: BitmapHeapScan streaming read user and prelim refactoring - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: BitmapHeapScan streaming read user and prelim refactoring |
Date | |
Msg-id | 063e4eb4-32d9-439e-a0b1-75565a9835a8@iki.fi Whole thread Raw |
In response to | Re: BitmapHeapScan streaming read user and prelim refactoring (Melanie Plageman <melanieplageman@gmail.com>) |
List | pgsql-hackers |
On 19/06/2024 18:55, Melanie Plageman wrote: > On Tue, Jun 18, 2024 at 6:02 PM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> >> I went through v22 to remind myself of what the patches do and do some >> basic review - I have some simple questions / comments for now, nothing >> major. I've kept the comments in separate 'review' patches, it does not >> seem worth copying here. > > Thanks so much for the review! > > I've implemented your feedback in attached v23 except for what I > mention below. I have not gone through each patch in the new set very > carefully after making the changes because I think we should resolve > the question of adding the new table scan descriptor before I do that. > A change there will require a big rebase. Then I can go through each > patch very carefully. Had a quick look at this after a long pause. I only looked at the first few, hoping that reviewing them would allow you to commit at least some of them, making the rest easier. v23-0001-table_scan_bitmap_next_block-counts-lossy-and-ex.patch Looks good to me. (I'm not sure if this would be a net positive change on its own, but it's needed by the later patch so OK) v23-0002-Remove-table_scan_bitmap_next_tuple-parameter-tb.patch LGTM v23-0003-Make-table_scan_bitmap_next_block-async-friendly.patch > @@ -1955,19 +1954,26 @@ table_relation_estimate_size(Relation rel, int32 *attr_widths, > */ > > /* > - * Prepare to fetch / check / return tuples from `tbmres->blockno` as part of a > - * bitmap table scan. `scan` needs to have been started via > - * table_beginscan_bm(). Returns false if there are no tuples to be found on > - * the page, true otherwise. `lossy_pages` is incremented if the block's > - * representation in the bitmap is lossy; otherwise, `exact_pages` is > - * incremented. > + * Prepare to fetch / check / return tuples as part of a bitmap table scan. > + * `scan` needs to have been started via table_beginscan_bm(). Returns false if > + * there are no more blocks in the bitmap, true otherwise. `lossy_pages` is > + * incremented if the block's representation in the bitmap is lossy; otherwise, > + * `exact_pages` is incremented. > + * > + * `recheck` is set by the table AM to indicate whether or not the tuples > + * from this block should be rechecked. Tuples from lossy pages will always > + * need to be rechecked, but some non-lossy pages' tuples may also require > + * recheck. > + * > + * `blockno` is only used in bitmap table scan code to validate that the > + * prefetch block is staying ahead of the current block. > * > * Note, this is an optionally implemented function, therefore should only be > * used after verifying the presence (at plan time or such). > */ > static inline bool > table_scan_bitmap_next_block(TableScanDesc scan, > - struct TBMIterateResult *tbmres, > + BlockNumber *blockno, bool *recheck, > long *lossy_pages, > long *exact_pages) > { The new comment doesn't really explain what *blockno means. Is it an input or output parameter? How is it used with the prefetching? v23-0004-Add-common-interface-for-TBMIterators.patch > +/* > + * Start iteration on a shared or private bitmap iterator. Note that tbm will > + * only be provided by private BitmapHeapScan callers. dsa and dsp will only be > + * provided by parallel BitmapHeapScan callers. For shared callers, one process > + * must already have called tbm_prepare_shared_iterate() to create and set up > + * the TBMSharedIteratorState. The TBMIterator is passed by reference to > + * accommodate callers who would like to allocate it inside an existing struct. > + */ > +void > +tbm_begin_iterate(TBMIterator *iterator, TIDBitmap *tbm, > + dsa_area *dsa, dsa_pointer dsp) > +{ > + Assert(iterator); > + > + iterator->private_iterator = NULL; > + iterator->shared_iterator = NULL; > + iterator->exhausted = false; > + > + /* Allocate a private iterator and attach the shared state to it */ > + if (DsaPointerIsValid(dsp)) > + iterator->shared_iterator = tbm_attach_shared_iterate(dsa, dsp); > + else > + iterator->private_iterator = tbm_begin_private_iterate(tbm); > +} Hmm, I haven't looked at how this is used the later patches, but a function signature where some parameters are used or not depending on the situation seems a bit awkward. Perhaps it would be better to let the caller call tbm_attach_shared_iterate(dsa, dsp) or tbm_begin_private_iterate(tbm), and provide a function to turn that into a TBMIterator? Something like: TBMIterator *tbm_iterator_from_shared_iterator(TBMSharedIterator *); TBMIterator *tbm_iterator_from_private_iterator(TBMPrivateIterator *); Does tbm_iterator() really need the 'exhausted' flag? The private and shared variants work without one. +1 on this patch in general, and I have no objections to its current form either, the above are just suggestions to consider. v23-0006-BitmapHeapScan-uses-unified-iterator.patch Makes sense. (Might be better to squash this with the previous patch) v23-0007-BitmapHeapScan-Make-prefetch-sync-error-more-det.patch LGTM v23-0008-Push-current-scan-descriptor-into-specialized-sc.patch v23-0009-Remove-ss_current-prefix-from-ss_currentScanDesc.patch LGTM. I would squash these together. v23-0010-Add-scan_in_progress-to-BitmapHeapScanState.patch > --- a/src/include/nodes/execnodes.h > +++ b/src/include/nodes/execnodes.h > @@ -1804,6 +1804,7 @@ typedef struct ParallelBitmapHeapState > * prefetch_target current target prefetch distance > * prefetch_maximum maximum value for prefetch_target > * initialized is node is ready to iterate > + * scan_in_progress is this a rescan > * pstate shared state for parallel bitmap scan > * recheck do current page's tuples need recheck > * blockno used to validate pf and current block in sync > @@ -1824,6 +1825,7 @@ typedef struct BitmapHeapScanState > int prefetch_target; > int prefetch_maximum; > bool initialized; > + bool scan_in_progress; > ParallelBitmapHeapState *pstate; > bool recheck; > BlockNumber blockno; Hmm, the "is this a rescan" comment sounds inaccurate, because it is set as soon as the scan is started, not only when rescanning. Other than that, LGTM. -- Heikki Linnakangas Neon (https://neon.tech)
pgsql-hackers by date: