Re: index prefetching - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: index prefetching |
Date | |
Msg-id | CAH2-WznH5SQNvtTonhQexdZ-0x_7-m8uAsrbUkxpucg5ErYNzA@mail.gmail.com Whole thread Raw |
In response to | Re: index prefetching (Tomas Vondra <tomas@vondra.me>) |
Responses |
Re: index prefetching
|
List | pgsql-hackers |
On Mon, Aug 25, 2025 at 10:18 AM Tomas Vondra <tomas@vondra.me> wrote: > Almost all regressions (at least the top ones) now look like this, i.e. > distance collapses to ~2.0, which essentially disables prefetching. Good to know. > But I no longer think it's caused by the "priorbatch" optimization, > which delays read stream creation until after the first batch. I still > think we may need to rethink that (e.g. if the first batch is huge), but > he distance can "collapse" even without it. The optimization just makes > it easier to happen. That shouldn't count against the "priorbatch" optimization. I still think that this issue should be treated as 100% unrelated to the "priorbatch" optimization. You might very well be right that the "priorbatch" optimization is too naive about index scans whose first/possibly only leaf page has TIDs that point to many distinct heap blocks (hundreds, say). But there's no reason to think that that's truly relevant to the problem at hand. If there was such a problem, then it wouldn't look like a regression against enable_indexscan_prefetch = off/master. We'd likely require a targeted approach to even notice such a problem; so far, most/all of our index scan test cases have read hundreds/thousands of index pages -- so any problem that's limited to the first leaf page read is likely to go unnoticed. I think that the "priorbatch" optimization at least takes *approximately* the right approach, which is good enough for now. It at least shouldn't ever do completely the wrong thing. It even seems possible that sufficiently testing will actually show that its naive approach to be the best one, on balance, once the cost of adding mitigations (costs for all queries, not just ones like the one you looked at recently) is taken into account. I suggest that we not even think about "priorbatch" until the problem on the read stream side is fixed. IMV we should at least have a prototype patch for the read stream that we're reasonably happy with before looking at "priorbatch" in further detail. I don't think we have that right now. > AFAICS the distance collapse is "inherent" to how the distance gets > increased/decreased after hits/misses. Right. (I think that you'll probably agree with me about addressing this problem before even thinking about limitations in the "priorbatch" optimization, but I thought it best to be clear about that.) > I find this distance heuristics a bit strange, for a couple reasons: > > * It doesn't seem right to get stuck at distance=2 with 50% misses. > Surely that would benefit from prefetching a bit more? Maybe, but at what cost? It doesn't necessarily make sense to continue to read additional leaf pages, regardless of the number of heap buffer hits in the recent past. At some point it likely makes more sense to just give up and do actual query processing/return rows to the scan. Even without a LIMIT. I have low confidence here, though. > * It mostly ignores effective_io_concurrency, which I think about as > "Keep this number of I/Os in the queue." But we don't try doing that. As I said, I might just be wrong about "just giving up at some point" making sense. I just don't necessarily think it makes sense to go from ignoring effective_io_concurrency to *only* caring about effective_io_concurrency. It's likely true that keeping effective_io_concurrency-many I/Os in flight is the single most important thing -- but I doubt it's the only thing that ever matters (again, even assuming that there's no LIMIT involved). > Attached is an example table/query, found by my script. Without the > read_stream patch (i.e. just with the current index prefetching), it > looks like this: > So it's more a case of "mitigating a regression" (finding regressions > like this is the purpose of my script). Still, I believe the questions > about the distance heuristics are valid. > > (Another interesting detail is that the regression happens only with > io_method=worker, not with io_uring. I'm not sure why.) I find that the regression happens with io_uring. I also find that your patch doesn't fix it. I have no idea why. -- Peter Geoghegan
pgsql-hackers by date: