Re: index prefetching - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: index prefetching |
Date | |
Msg-id | ec4d8c7b-bbc7-4805-919f-8fdeb11b3d34@vondra.me Whole thread Raw |
In response to | Re: index prefetching (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: index prefetching
|
List | pgsql-hackers |
On 7/13/25 01:50, Peter Geoghegan wrote: > On Thu, May 1, 2025 at 7:02 PM Tomas Vondra <tomas@vondra.me> wrote: >> There's two "fix" patches trying to make this work - it does not crash, >> and almost all the "incorrect" query results are actually stats about >> buffer hits etc. And that is expected to change with prefetching, not a >> bug. But then there are a bunch of explains where the number of index >> scans changed, e.g. like >> >> - Index Searches: 5 >> + Index Searches: 4 >> >> And that is almost certainly a bug. >> >> I haven't figured this out yet, and I feel a bit lost again :-( > > For the benefit of other people reading this thread: I sent Tomas a > revised version of this "complex" patch this week, fixing all these > bugs. It only took me a few hours, and I regret not doing that work > sooner. > > I also cleaned up nbtree aspects of the "complex" patch considerably. > The nbtree footprint was massively reduced: > > 17 files changed, 422 insertions(+), 685 deletions(-) > > So there's a net negative nbtree code footprint. We're effectively > just moving things out of nbtree that are already completely > index-AM-generic. I think that the amount of code that can be removed > from nbtree (and other AMs that currently use amgettuple) will be even > higher if we go this way. > Thank you! I'll take a look next week, but these numbers suggest you simplified it a lot.. >> The one real limitation of the simpler approach is that prefetching is >> limited to a single leaf page - we can't prefetch from the next one, >> until the scan advances to it. But based on experiments comparing this >> simpler and the "complex" approach, I don't think that really matters >> that much. I haven't seen any difference for regular queries. > > Did you model/benchmark it? > Yes. I did benchmark the simple and complex versions I had at the time. But you know how it's with benchmarking - I'm sure it's possible to pick queries where it'd make a (significant) difference. For example if you make the index tuples "fat" that would make the prefetching less efficient. Another thing is hardware. I've been testing on local NVMe drives, and those don't seem to need very long queues (it's diminishing returns). Maybe the results would be different on systems with more I/O latency (e.g. because the storage is not local). >> The one case where I think it might matter is queries with array keys, >> where each array key matches a single tuple on a different leaf page. >> The complex patch might prefetch tuples for later array values, while >> the simpler patch won't be able to do that. If an array key matches >> multiple tuples, the simple patch can prefetch those just fine, of >> course. I don't know which case is more likely. > > We discussed this in Montreal, but I'd like to respond to this point > again on list: > > I don't think that array keys are in any way relevant to the design of > this patch. Nothing I've said about this project has anything to do > with array keys, except when I was concerned about specific bugs in > the patch. (Bugs that I've now fixed in a way that is wholly confined > to nbtree.) > > The overarching goal of my work on nbtree array scans was to make them > work just like other scans to the maximum extent possible. Array scans > "where each array key matches a single tuple on a different leaf page" > are virtually identical to any other scan that'll return only one or > two tuples from each neighboring page. You could see a similar pattern > with literally any kind of key. > > Again, what I'm concerned about is coming up with a design that gives > scans maximum freedom to reorder work (not necessarily in the first > committed version), so that we can keep the read stream busy by giving > it sufficiently many heap pages to read: a truly adaptive design, that > weighs all relevant costs. Sometimes that'll necessitate eagerly > reading leaf pages. There is nothing fundamentally complicated about > that idea. Nothing in index AMs cares about how or when heap accesses > take place. > > Again, it just *makes sense* to centralize the code that controls the > progress of ordered/amgettuple scans. Every affected index AM is > already doing virtually the same thing as each other. They're all > following the rules around index locking/pinning for amgettuple [1]. > Individual index AMs are *already* required to read leaf pages a > certain way, in a certain order *relative to the heap accesses*. All > for the benefit of scan correctness (to avoid breaking things in a way > that relates to heapam implementation details). > > Why wouldn't we want to relieve all AMs of that responsibility? > Leaving it up to index AMs has resulted in subtle bugs [2][3], and > AFAICT has no redeeming quality. If affected index AMs were *forced* > to do *exactly* the same thing as each other (not just *oblidged* to > do *almost* the same thing), it would make life easier for everybody. > > [1] https://www.postgresql.org/docs/current/index-locking.html > [2] https://commitfest.postgresql.org/patch/5721/ > [3] https://commitfest.postgresql.org/patch/5542/ Thanks. I don't remember the array key details, I'll need to swap the context back in. But I think the thing I've been concerned about the most is the coordination of advancing to the next leaf page vs. the next array key (and then perhaps having to go back when the scan direction changes). regards -- Tomas Vondra
pgsql-hackers by date: