Re: index prefetching - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: index prefetching |
Date | |
Msg-id | 1a4536f9-e4cb-419d-84ab-7ed6cfc62fa5@vondra.me Whole thread Raw |
In response to | Re: index prefetching (Peter Geoghegan <pg@bowt.ie>) |
List | pgsql-hackers |
On 7/23/25 03:31, Peter Geoghegan wrote: > On Tue, Jul 22, 2025 at 8:37 PM Tomas Vondra <tomas@vondra.me> wrote: >>> I happen to think that that's a very unrealistic assumption. Most >>> standard benchmarks have indexes that almost all look fairly similar >>> to pgbench_accounts_pkey, from the point of view of "heap page blocks >>> per leaf page". There are exceptions, of course (e.g., the TPC-C order >>> table's primary key suffers from fragmentation). >>> >> >> I agree with all of this. > > Cool. > >> I assume you mean results for the "linear" data set, because for every >> other data set the patches perform almost exactly the same (when >> restoring the distance after stream reset): >> >> https://github.com/tvondra/indexscan-prefetch-tests/blob/with-distance-restore-after-reset/d16-rows-cold-32GB-16-unscaled.pdf > > Right. > >> And it's a very good point. I was puzzled by this too for a while, and >> it took me a while to understand how/why this happens. It pretty much >> boils down to the "duplicate block" detection and how it interacts with >> the stream resets (again!). > > I think that you slightly misunderstand where I'm coming from here: it > *doesn't* puzzle me. What puzzled me was that it puzzled you. > > Andres' test query is very simple, and not entirely sympathetic > towards the complex patch (by design). And yet it *also* gets quite a > decent improvement from the complex patch. It doesn't speed things up > by another order of magnitude or anything, but it's a very decent > improvement -- one well worth having. > > I'm also unsurprised at the fact that all the other tests that you ran > were more or less a draw between simple and complex. At least not now > that I've drilled down and understood what the indexes from those > other test cases actually look like, in practice. > >> So you're right the complex patch prefetches far ahead. I thought the >> distance will quickly decrease because of the duplicate blocks, but I >> missed the fact the read stream will not seem them at all. > > FWIW I wasn't thinking about it at anything like that level of > sophistication. Everything I've said about it was based on intuitions > about how the prefetching was bound to work, for each different kind > of index. I just looked at individual leaf pages (or small groups of > them) from each index/test, and considered their TIDs, and imagined > how that was likely to affect the scan. > > It just seems obvious to me that all the tests (except for "linear") > couldn't possibly be helped by eagerly reading multiple leaf pages. It > seemed equally obvious that it's quite possible to come up with a > suite of tests that have several tests that could benefit in the same > way (not just 1). Although your "linear_1"/"linear_N" tests aren't > actually like that, many cases will be -- and not just those that are > perfectly correlated ala pgbench. > >> I'm not sure it's desirable to "hide" blocks from the read stream like >> this - it'll never see the misses. How could it make good decisions, >> when we skew the data used by the heuristics like this? > > I don't think that I fully understand what's desirable here myself. > >>> Doing *no* prefetching will usually be the right thing to do. Does >>> that make index prefetching pointless in general? >>> >> >> I don't think so. Why would it? There's plenty of queries that can >> benefit from it a lot, and as long as it doesn't cause harm to other >> queries it's a win. > > I was being sarcastic. That wasn't a useful thing for me to do. Apologies. > >> This is not resetting the stream, though. This is resetting the position >> tracking how far the stream got. > > My main point is that there's stuff going on here that nobody quite > understands just yet. And so it probably makes sense to defensively > assume that the prefetch distance resetting stuff might matter with > either the complex or simple patch. > >> Sorry, got distracted and forgot to complete the sentence. I think I >> wanted to write "mostly from not resetting the distance to 1". Which is >> true, but the earlier "linear" example also shows there are cases where >> the page boundaries are significant. > > Of course that's true. But that was just a temporary defect of the > "simple" patch (and perhaps even for the "complex" patch, albeit to a > much lesser degree). It isn't really relevant to the important > question of whether the simple or complex design should be pursued -- > we know that now. > > As I said, I don't think that the test suite is particularly well > suited to evaluating simple vs complex. Because there's only one test > ("linear") that has any hope of being better with the complex patch. > And because having only 1 such test isn't representative. > >> That's actually intentional. I wanted to model tables with wider tuples, >> without having to generate all the data etc. Maybe 25% is too much, and >> real table have more than 20 tuples. It's true 400B is fairly large. > > My point about fill factor isn't particularly important. > Yeah, the randomness of the TIDs matters too much. >> I'm not against testing with other parameters, of course. The test was >> not originally written for comparing different prefetching patches, so >> it may not be quite fair (and I'm not sure how to define "fair"). > > I'd like to see more than 1 test where eagerly reading leaf pages has > any hope of helping. That's my only important concern. > Agreed. >> It's not uniformly random, I wrote it uses normal distribution. The >> query in the SQL script does this: >> >> select x + random_normal(0, 1000) from ... >> >> It is a synthetic test data set, of course. It's meant to be simple to >> generate, reason about, and somewhere in between the "linear" and >> "uniform" data sets. > > I always start by looking at the index leaf pages, and imagining how > an index scan can/will deal with that. > > Just because it's not truly uniformly random doesn't mean that that's > apparent when you just look at one leaf page -- heap blocks might very > well *appear* to be uniformly random (or close to it) when you drill > down like that. Or even when you look at (say) 50 neighboring leaf > pages. > Yeah, the number of heap blocks per leaf page is a useful measure. I should have thought about that. The other thing worth tracking is probably how the number of heap blocks increases with multiple leaf pages, to measure the "hit ratio". I should have thought about this more when creating the data sets ... >> But it also has realistic motivation - real tables are usually not as >> clean as "linear", nor as random as the "uniform" data sets (not for all >> columns, at least). If you're looking at data sets like "orders" or >> whatever, there's usually a bit of noise even for columns like "date" >> etc. People modify the orders, or fill-in data from a couple days ago, >> etc. Perfect correlation for one column implies slightly worse >> correlation for another column (order date vs. delivery date). > > I agree. > >> Right. I don't see a problem with this. I'm not saying parameters for >> this particular data set are "perfect", but the intent is to have a >> range of data sets from "perfectly clean" to "random" and see how the >> patch(es) behave on all of them. > > Obviously none of your test cases are invalid -- they're all basically > reasonable, when considered in isolation. But the "linear_1" test is > *far* closer to the "uniform" test than it is to the "linear" test. At > least as far as the simple vs complex question is concerned. > Perhaps not invalid, but it also does not cover the space of possible data sets the way I intended. It seems all the data sets are much more random than I expected. >> If you have a suggestion for different data sets, or how to tweak the >> parameters to make it more realistic, I'm happy to try those. > > I'll get back to you on this soon. There are plenty of indexes that > are not perfectly correlated (like pgbench_accounts_pkey is) that'll > nevertheless benefit significantly from the approach taken by the > complex patch. I'm sure of this because I've been using the query I > posted early for many years now -- I've thought about and directly > instrumented the "nhtids:nhblks" of an index of interest many times in > the past. > Thanks! regards -- Tomas Vondra
pgsql-hackers by date: