Re: BitmapHeapScan streaming read user and prelim refactoring - Mailing list pgsql-hackers
From | Melanie Plageman |
---|---|
Subject | Re: BitmapHeapScan streaming read user and prelim refactoring |
Date | |
Msg-id | 20240323002211.on5vb5ulk6lsdb2u@liskov Whole thread Raw |
In response to | Re: BitmapHeapScan streaming read user and prelim refactoring (Heikki Linnakangas <hlinnaka@iki.fi>) |
Responses |
Re: BitmapHeapScan streaming read user and prelim refactoring
|
List | pgsql-hackers |
On Tue, Mar 19, 2024 at 02:33:35PM +0200, Heikki Linnakangas wrote: > On 18/03/2024 17:19, Melanie Plageman wrote: > > I've attached v7 rebased over this commit. > > If we delayed table_beginscan_bm() call further, after starting the TBM > iterator, we could skip it altogether when the iterator is empty. > > That's a further improvement, doesn't need to be part of this patch set. > Just caught my eye while reading this. Hmm. You mean like until after the first call to tbm_[shared]_iterate()? AFAICT, tbm_begin_iterate() doesn't tell us anything about whether or not the iterator is "empty". Do you mean cases when the bitmap has no blocks in it? It seems like we should be able to tell that from the TIDBitmap. > > > v7-0003-Push-BitmapHeapScan-skip-fetch-optimization-into-.patch > > I suggest to avoid the double negative with SO_CAN_SKIP_FETCH, and call the > flag e.g. SO_NEED_TUPLE. Agreed. Done in attached v8. Though I wondered if it was a bit weird that the flag is set in the common case and not set in the uncommon case... > As yet another preliminary patch before the streaming read API, it would be > nice to move the prefetching code to heapam.c too. I've done this, but I can say it is not very pretty. see 0013. I had to add a bunch of stuff to TableScanDescData and HeapScanDescData which are only used for bitmapheapscans. I don't know if it makes the BHS streaming read user patch easier to review, but I don't think what I have in 0013 is committable to Postgres. Maybe there was another way I could have approached it. Let me know what you think. In addition to bloating the table descriptors, note that it was difficult to avoid one semantic change -- with 0013, we no longer prefetch or adjust prefetch target when emitting each empty tuple -- though I think this is could actually be desirable. > What's the point of having separate table_scan_bitmap_next_block() and > table_scan_bitmap_next_tuple() functions anymore? The AM owns the TBM > iterator now. The executor node updates the lossy/exact page counts, but > that's the only per-page thing it does now. Oh, interesting. Good point. I've done this in 0015. If you like the way it turned out, I can probably rebase this back into an earlier point in the set and end up dropping some of the other incremental changes (e.g. 0008). > > /* > > * If this is the first scan of the underlying table, create the table > > * scan descriptor and begin the scan. > > */ > > if (!scan) > > { > > uint32 extra_flags = 0; > > > > /* > > * We can potentially skip fetching heap pages if we do not need > > * any columns of the table, either for checking non-indexable > > * quals or for returning data. This test is a bit simplistic, as > > * it checks the stronger condition that there's no qual or return > > * tlist at all. But in most cases it's probably not worth working > > * harder than that. > > */ > > if (node->ss.ps.plan->qual == NIL && node->ss.ps.plan->targetlist == NIL) > > extra_flags |= SO_CAN_SKIP_FETCH; > > > > scan = node->ss.ss_currentScanDesc = table_beginscan_bm( > > node->ss.ss_currentRelation, > > node->ss.ps.state->es_snapshot, > > 0, > > NULL, > > extra_flags); > > } > > > > scan->tbmiterator = tbmiterator; > > scan->shared_tbmiterator = shared_tbmiterator; > > How about passing the iterator as an argument to table_beginscan_bm()? You'd > then need some other function to change the iterator on rescan, though. Not > sure what exactly to do here, but feels that this part of the API is not > fully thought-out. Needs comments at least, to explain who sets tbmiterator > / shared_tbmiterator and when. For comparison, for a TID scan there's a > separate scan_set_tidrange() table AM function. Maybe follow that example > and introduce scan_set_tbm_iterator(). I've spent quite a bit of time playing around with the code trying to make it less terrible than what I had before. On rescan, we have to actually make the whole bitmap and iterator. And, we don't have what we need to do that in table_rescan()/heap_rescan(). From what I can tell, scan_set_tidrange() is useful because it can be called from both the beginscan and rescan functions without invoking it directly from TidNext(). In our case, any wrapper function we wrote would basically just assign the iterator to the scan in BitmapHeapNext(). I've reorganized this code structure a bit, so see if you like it more now. I rebased the changes into some of the other patches, so you'll just have to look at the result and see what you think. > It's bit awkward to have separate tbmiterator and shared_tbmiterator fields. > Could regular and shared iterators be merged, or wrapped under a common > interface? This is a good idea. I've done that in 0014. It made the code nicer, but I just wonder if it will add too much overhead. - Melanie
pgsql-hackers by date: