Home > mailing lists

Re: BitmapHeapScan streaming read user and prelim refactoring - Mailing list pgsql-hackers

From	Dilip Kumar
Subject	Re: BitmapHeapScan streaming read user and prelim refactoring
Date	December 9, 2024 09:22:30
Msg-id	CAFiTN-skv5m0PnLU84m07OSoEma0YbDz=GUNzzVDJqnoE_Nhvw@mail.gmail.com Whole thread Raw
In response to	Re: BitmapHeapScan streaming read user and prelim refactoring (Melanie Plageman <melanieplageman@gmail.com>)
List	pgsql-hackers

Tree view

On Sat, Oct 19, 2024 at 2:18 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:
>
> On Wed, Oct 16, 2024 at 11:09 AM Tomas Vondra <tomas@vondra.me> wrote:
> >
>
> Thanks for the review!
>
> >
> > 1) 0001
> >
> > I find it a bit weird that we use ntuples to determine if it's exact or
> > lossy. Not an issue caused by this patch, of course, but maybe we could
> > improve that somehow? I mean this:
> >
> >     if (tbmres->ntuples >= 0)
> >         (*exact_pages)++;
> >     else
> >         (*lossy_pages)++;
>
> I agree. The most straightforward solution would be to add a boolean
> `lossy` to TBMIterateResult.
>
> In some ways, I think it would make it a bit more clear because the
> situation with TBMIterateResult->recheck is already a bit confusing.
> It recheck is true if either 1: the Bitmap is lossy and we cannot use
> TBMIterateResult->offsets for this page or 2: the original query has a
> qual requiring recheck
>
> In the body of BitmapHeapNext() it says "if we are using lossy info,
> we have to recheck the qual conditions at every tuple". And in
> heapam_bitmap_next_block(), when the bitmap is lossy it says "bitmap
> is lossy, so we must examine each line pointer on the page. But we can
> ignore HOT chains, since we'll check each tuple anyway"
>
> I must admit I don't quite understand the connection between a lossy
> bitmap and needing to recheck the qual condition. Once we've gone to
> the underlying data, why do we need to recheck each tuple unless the
> qual requires it? It seems like PageTableEntry->recheck is usually
> passed as true for hash indexes and false for btree indexes. So maybe
> it has something to do with the index type?
>
> Anyway, adding a `lossy` member can be another patch. I just wanted to
> check if that was the solution you were thinking of.
>
> On a somewhat related note, I included a patch to alter the test at
> the top of heapam_bitmap_next_tuple()
>
> if (hscan->rs_cindex < 0 || hscan->rs_cindex >= hscan->rs_ntuples)
>
> I don't really see how rs_cindex could ever be < 0 for bitmap heap
> scans. But, I'm not confident enough about this to commit the patch
> really (obviously tests pass with the Assertion I added -- but still).
>
> > Also, maybe consider manually wrapping long lines - I haven't tried, but
> > I guess pgindent did not wrap this. I mean this line, for example:
> >
> > ... whitespace ... &node->stats.lossy_pages, &node->stats.exact_pages))
>
> Ah, yes. pgindent indeed does not help with this. I have now gone
> through every commit and used
>
> git diff origin/master | grep -E '^(\+|diff)' | sed 's/^+//' | expand
> -t4 | awk "length > 78 || /^diff/"
>
> from the committing wiki :)
>
> > 2) 0003
> >
> > The "tidr" name is not particularly clear, IMO. Maybe just "range" would
> > be better?
>
> I've updated this.
>
> > +  struct
> > +  {
> > +      ItemPointerData rs_mintid;
> > +      ItemPointerData rs_maxtid;
> > +  }   tidr;
> >
> > Isn't it a bit strange that this patch remove tmbres from
> > heapam_scan_bitmap_next_block, when it was already removed from
> > table_scan_bitmap_next_tuple in the previous commit? Does it make sense
> > to separate it like this? (Maybe it does, not sure.)
>
> I've combined them.
>
> > I find this not very clear:
> >
> > + *    recheck     do current page's tuples need recheck
> > + *    blockno     used to validate pf and current block in sync
> > + *    pfblockno   used to validate pf stays ahead of current block
> >
> > The "blockno" sounds weird - shouldn't it say "are in sync"? Also, not
> > clear what "pf" stands for without context (sure, it's "prefetch"). But
> > Why not to not write "prefetch_blockno" or something like that?
>
> Fixed.
>
> > 3) 0006
> >
> > Seems unnecessary to keep this as separate commit, it's  just log
> > message improvement. I'd merge it to the earlier patch.
>
> Done.
>
> > 4) 0007
> >
> > Seems fine to me in principle. This adds "subclass" similar to what we
> > do for plan nodes, except that states are not derived from Node. But
> > it's the same approach.
> >
> > Why not to do
> >
> >     scan = (HeapScanDesc *) bscan;
> >
> > instead of
> >
> >     scan = &bscan->rs_heap_base;
> >
> > I think the simple cast is what we do for the plan nodes, and we even do
> > it this way in the opposite direction in a couple places:
> >
> >     BitmapHeapScanDesc *bscan = (BitmapHeapScanDesc *) scan;
>
> Yea, in the reverse case, there is no way to refer to a specific
> member (because we are casting something of the parent type to the
> child type). But when we are casting from the child type to the parent
> type and can actually refer to a member, it seems preferable to
> explicitly refer to the member in case anyone ever inserts a member
> above rs_heap_base.  However, for consistency with other places in the
> code, I've changed it as you suggested.
>
> Additionally, I've gone through an made a number of stylistic changes
> to conform more closely to existing styles (adding back more rs_
> prefixes and typedef'ing BitmapHeapScanDescData * as
> BitmapHeapScanDesc [despite how sad it makes me]).
>
> > BTW would it be a good idea for heapam_scan_bitmap_next_block to check
> > the passed "scan" has SO_TYPE_BITMAPSCAN? We haven't been checking that
> > so far I think, but we only had a single struct ...
>
> Good point. I've done this.
>
> I plan to commit 0002 and 0003 next week. I'm interested if you think
> 0001 is correct.
> I may also commit 0004-0006 as I feel they are ready too.

Are we planning to commit this refactoring? I think this refactoring
makes the overall code of BitmapHeapNext() quite clean and readable.
I haven't read all patches but 0001-0006 including 0009 makes this
code quite clean and readable. I like the refactoring of merging of
the shared iterator and the private iterator also.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: jian he
Date: 09 December 2024, 09:16:04
Subject: Re: proposal: schema variables

From: Darek Ślusarczyk
Date: 09 December 2024, 09:25:51
Subject: Re: add support for the old naming libs convention on windows (ssleay32.lib and libeay32.lib)

Re: BitmapHeapScan streaming read user and prelim refactoring - Mailing list pgsql-hackers

Previous

Next