Re: BitmapHeapScan streaming read user and prelim refactoring - Mailing list pgsql-hackers
From | Dilip Kumar |
---|---|
Subject | Re: BitmapHeapScan streaming read user and prelim refactoring |
Date | |
Msg-id | CAFiTN-skv5m0PnLU84m07OSoEma0YbDz=GUNzzVDJqnoE_Nhvw@mail.gmail.com Whole thread Raw |
In response to | Re: BitmapHeapScan streaming read user and prelim refactoring (Melanie Plageman <melanieplageman@gmail.com>) |
List | pgsql-hackers |
On Sat, Oct 19, 2024 at 2:18 AM Melanie Plageman <melanieplageman@gmail.com> wrote: > > On Wed, Oct 16, 2024 at 11:09 AM Tomas Vondra <tomas@vondra.me> wrote: > > > > Thanks for the review! > > > > > 1) 0001 > > > > I find it a bit weird that we use ntuples to determine if it's exact or > > lossy. Not an issue caused by this patch, of course, but maybe we could > > improve that somehow? I mean this: > > > > if (tbmres->ntuples >= 0) > > (*exact_pages)++; > > else > > (*lossy_pages)++; > > I agree. The most straightforward solution would be to add a boolean > `lossy` to TBMIterateResult. > > In some ways, I think it would make it a bit more clear because the > situation with TBMIterateResult->recheck is already a bit confusing. > It recheck is true if either 1: the Bitmap is lossy and we cannot use > TBMIterateResult->offsets for this page or 2: the original query has a > qual requiring recheck > > In the body of BitmapHeapNext() it says "if we are using lossy info, > we have to recheck the qual conditions at every tuple". And in > heapam_bitmap_next_block(), when the bitmap is lossy it says "bitmap > is lossy, so we must examine each line pointer on the page. But we can > ignore HOT chains, since we'll check each tuple anyway" > > I must admit I don't quite understand the connection between a lossy > bitmap and needing to recheck the qual condition. Once we've gone to > the underlying data, why do we need to recheck each tuple unless the > qual requires it? It seems like PageTableEntry->recheck is usually > passed as true for hash indexes and false for btree indexes. So maybe > it has something to do with the index type? > > Anyway, adding a `lossy` member can be another patch. I just wanted to > check if that was the solution you were thinking of. > > On a somewhat related note, I included a patch to alter the test at > the top of heapam_bitmap_next_tuple() > > if (hscan->rs_cindex < 0 || hscan->rs_cindex >= hscan->rs_ntuples) > > I don't really see how rs_cindex could ever be < 0 for bitmap heap > scans. But, I'm not confident enough about this to commit the patch > really (obviously tests pass with the Assertion I added -- but still). > > > Also, maybe consider manually wrapping long lines - I haven't tried, but > > I guess pgindent did not wrap this. I mean this line, for example: > > > > ... whitespace ... &node->stats.lossy_pages, &node->stats.exact_pages)) > > Ah, yes. pgindent indeed does not help with this. I have now gone > through every commit and used > > git diff origin/master | grep -E '^(\+|diff)' | sed 's/^+//' | expand > -t4 | awk "length > 78 || /^diff/" > > from the committing wiki :) > > > 2) 0003 > > > > The "tidr" name is not particularly clear, IMO. Maybe just "range" would > > be better? > > I've updated this. > > > + struct > > + { > > + ItemPointerData rs_mintid; > > + ItemPointerData rs_maxtid; > > + } tidr; > > > > Isn't it a bit strange that this patch remove tmbres from > > heapam_scan_bitmap_next_block, when it was already removed from > > table_scan_bitmap_next_tuple in the previous commit? Does it make sense > > to separate it like this? (Maybe it does, not sure.) > > I've combined them. > > > I find this not very clear: > > > > + * recheck do current page's tuples need recheck > > + * blockno used to validate pf and current block in sync > > + * pfblockno used to validate pf stays ahead of current block > > > > The "blockno" sounds weird - shouldn't it say "are in sync"? Also, not > > clear what "pf" stands for without context (sure, it's "prefetch"). But > > Why not to not write "prefetch_blockno" or something like that? > > Fixed. > > > 3) 0006 > > > > Seems unnecessary to keep this as separate commit, it's just log > > message improvement. I'd merge it to the earlier patch. > > Done. > > > 4) 0007 > > > > Seems fine to me in principle. This adds "subclass" similar to what we > > do for plan nodes, except that states are not derived from Node. But > > it's the same approach. > > > > Why not to do > > > > scan = (HeapScanDesc *) bscan; > > > > instead of > > > > scan = &bscan->rs_heap_base; > > > > I think the simple cast is what we do for the plan nodes, and we even do > > it this way in the opposite direction in a couple places: > > > > BitmapHeapScanDesc *bscan = (BitmapHeapScanDesc *) scan; > > Yea, in the reverse case, there is no way to refer to a specific > member (because we are casting something of the parent type to the > child type). But when we are casting from the child type to the parent > type and can actually refer to a member, it seems preferable to > explicitly refer to the member in case anyone ever inserts a member > above rs_heap_base. However, for consistency with other places in the > code, I've changed it as you suggested. > > Additionally, I've gone through an made a number of stylistic changes > to conform more closely to existing styles (adding back more rs_ > prefixes and typedef'ing BitmapHeapScanDescData * as > BitmapHeapScanDesc [despite how sad it makes me]). > > > BTW would it be a good idea for heapam_scan_bitmap_next_block to check > > the passed "scan" has SO_TYPE_BITMAPSCAN? We haven't been checking that > > so far I think, but we only had a single struct ... > > Good point. I've done this. > > I plan to commit 0002 and 0003 next week. I'm interested if you think > 0001 is correct. > I may also commit 0004-0006 as I feel they are ready too. Are we planning to commit this refactoring? I think this refactoring makes the overall code of BitmapHeapNext() quite clean and readable. I haven't read all patches but 0001-0006 including 0009 makes this code quite clean and readable. I like the refactoring of merging of the shared iterator and the private iterator also. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: