Re: index prefetching - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: index prefetching |
Date | |
Msg-id | 20240215002851.tltqfipjwt5st53y@awork3.anarazel.de Whole thread Raw |
In response to | Re: index prefetching (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: index prefetching
|
List | pgsql-hackers |
Hi, On 2024-02-13 14:54:14 -0500, Peter Geoghegan wrote: > This property of index scans is fundamental to how index scans work. > Pinning an index page as an interlock against concurrently TID > recycling by VACUUM is directly described by the index API docs [1], > even (the docs actually use terms like "buffer pin" rather than > something more abstract sounding). I don't think that anything > affecting that behavior should be considered an implementation detail > of the nbtree index AM as such (nor any particular index AM). Given that the interlock is only needed for non-mvcc scans, that non-mvcc scans are rare due to catalog accesses using snapshots these days and that most non-mvcc scans do single-tuple lookups, it might be viable to be more restrictive about prefetching iff non-mvcc snapshots are in use and to use method of cleanup that allows multiple pages to be cleaned up otherwise. However, I don't think we would necessarily have to relax the IAM pinning rules, just to be able to do prefetching of more than one index leaf page. Restricting prefetching to entries within a single leaf page obviously has the disadvantage of not being able to benefit from concurrent IO whenever crossing a leaf page boundary, but at the same time processing entries from just two leaf pages would often allow for a sufficiently aggressive prefetching. Pinning a small number of leaf pages instead of a single leaf page shouldn't be a problem. One argument for loosening the tight coupling between kill_prior_tuples and index scan progress is that the lack of kill_prior_tuples for bitmap scans is quite problematic. I've seen numerous production issues with bitmap scans caused by subsequent scans processing a growing set of dead tuples, where plain index scans were substantially slower initially but didn't get much slower over time. We might be able to design a system where the bitmap contains a certain number of back-references to the index, allowing later cleanup if there weren't any page splits or such. > I think that it makes sense to put the index AM in control here -- > that almost follows from what I said about the index AM API. The index > AM already needs to be in control, in about the same way, to deal with > kill_prior_tuple (plus it helps with the LIMIT issue I described). Depending on what "control" means I'm doubtful: Imo there are decisions influencing prefetching that an index AM shouldn't need to know about directly, e.g. how the plan shape influences how many tuples are actually going to be consumed. Of course that determination could be made in planner/executor and handed to IAMs, for the IAM to then "control" the prefetching. Another aspect is that *long* term I think we want to be able to execute different parts of the plan tree when one part is blocked for IO. Of course that's not always possible. But particularly with partitioned queries it often is. Depending on the form of "control" that's harder if IAMs are in control, because control flow needs to return to the executor to be able to switch to a different node, so we can't wait for IO inside the AM. There probably are ways IAMs could be in "control" that would be compatible with such constraints however. Greetings, Andres Freund
pgsql-hackers by date: