Re: BitmapHeapScan streaming read user and prelim refactoring - Mailing list pgsql-hackers
From | Melanie Plageman |
---|---|
Subject | Re: BitmapHeapScan streaming read user and prelim refactoring |
Date | |
Msg-id | 20240407041734.74jwo5wjmxs6p5du@liskov Whole thread Raw |
In response to | Re: BitmapHeapScan streaming read user and prelim refactoring (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Responses |
Re: BitmapHeapScan streaming read user and prelim refactoring
Re: BitmapHeapScan streaming read user and prelim refactoring |
List | pgsql-hackers |
On Sun, Apr 07, 2024 at 02:27:43AM +0200, Tomas Vondra wrote: > On 4/6/24 23:34, Melanie Plageman wrote: > > ... > >> > >> I realized it makes more sense to add a FIXME (I used XXX. I'm not when > >> to use what) with a link to the message where Andres describes why he > >> thinks it is a bug. If we plan on fixing it, it is good to have a record > >> of that. And it makes it easier to put a clear and accurate comment. > >> Done in 0009. > >> > >>> OK, thanks. If think 0001-0008 are ready to go, with some minor tweaks > >>> per above (tuple vs. tuples etc.), and the question about the recheck > >>> flag. If you can do these tweaks, I'll get that committed today and we > >>> can try to get a couple more patches in tomorrow. > > > > Attached v19 rebases the rest of the commits from v17 over the first > > nine patches from v18. All patches 0001-0009 are unchanged from v18. I > > have made updates and done cleanup on 0010-0021. > > > > I've pushed 0001-0005, I'll get back to this tomorrow and see how much > more we can get in for v17. Thanks! I thought about it a bit more, and I got worried about the Assert(scan->rs_empty_tuples_pending == 0); in heap_rescan() and heap_endscan(). I was worried if we don't complete the scan it could end up tripping incorrectly. I tried to come up with a query which didn't end up emitting all of the tuples on the page (using a LIMIT clause), but I struggled to come up with an example that qualified for the skip fetch optimization and also returned before completing the scan. I could work a bit harder tomorrow to try and come up with something. However, I think it might be safer to just change these to: scan->rs_empty_tuples_pending = 0 > What bothers me on 0006-0008 is that the justification in the commit > messages is "future commit will do something". I think it's fine to have > a separate "prepareatory" patches (I really like how you structured the > patches this way), but it'd be good to have them right before that > "future" commit - I'd like not to have one in v17 and then the "future > commit" in v18, because that's annoying complication for backpatching, > (and probably also when implementing the AM?) etc. Yes, I was thinking about this also. > AFAICS for v19, the "future commit" for all three patches (0006-0008) is > 0012, which introduces the unified iterator. Is that correct? Actually, those patches (v19 0006-0008) were required for v19 0009, which is why I put them directly before it. 0009 eliminates use of the TBMIterateResult for control flow in BitmapHeapNext(). I've rephrased the commit messages to not mention future commits and instead focus on what the changes in the commit are enabling. v19-0006 actually squashed very easily with v19-0009 and is actually probably better that way. It is still easy to understand IMO. In v20, I've attached just the functionality from v19 0006-0009 but in three patches instead of four. > Also, for 0008 I'm not sure we could even split it between v17 and v18, > because even if heapam did not use the iterator, what if some other AM > uses it? Without 0012 it'd be a problem for the AM, no? The iterators in the TableScanDescData were introduced in v19-0009. It is okay for other AMs to use it. In fact, they will want to use it. It is still initialized and set up in BitmapHeapNext(). They would just need to call tbm_iterate()/tbm_shared_iterate() on it. As for how table AMs will cope without the TBMIterateResult passed to table_scan_bitmap_next_tuple() (which is what v19 0008 did): they can save the location of the tuples to be scanned somewhere in their scan descriptor. Heap AM already did this and actually didn't use the TBMIterateResult at all. > Would it make sense to move 0009 before these three patches? That seems > like a meaningful change on it's own, right? Since v19 0009 requires these patches, I don't think we could do that. I think up to and including 0009 would be an improvement in clarity and function. As for all the patches after 0009, I've dropped them from this version. We are out of time, and they need more thought. After we decided not to pursue streaming bitmapheapscan for 17, I wanted to make sure we removed the prefetch code table AM violation -- since we weren't deleting that code. So what started out as me looking for a way to clean up one commit ended up becoming a much larger project. Sorry about that last minute code explosion! I do think there is a way to do it right and make it nice. Also that violation would be gone if we figure out how to get streaming bitmapheapscan behaving correctly. So, there's just more motivation to make streaming bitmapheapscan awesome for 18! Given all that, I've only included the three patches I think we are considering (former v19 0006-0008). They are largely the same as you saw them last except for squashing the two commits I mentioned above and updating all of the commit messages. > FWIW I don't think it's very likely I'll commit the UnifiedTBMIterator > stuff. I do agree with the idea in general, but I think I'd need more > time to think about the details. Sorry about that ... Yes, that makes total sense. I 100% agree. I do think the UnifiedTBMIterator (maybe the name is not good, though) is a good way to simplify the BitmapHeapScan code and is applicable to any future TIDBitmap user with both a parallel and serial implementation. So, there's a nice, small patch I can register for July. Thanks again for taking time to work on this! - Melanie
Attachment
pgsql-hackers by date: