Re: index prefetching - Mailing list pgsql-hackers
| From | Tomas Vondra |
|---|---|
| Subject | Re: index prefetching |
| Date | |
| Msg-id | 720560ca-97b9-40a1-ad40-9f9b8a6648e9@vondra.me Whole thread Raw |
| In response to | Re: index prefetching (Peter Geoghegan <pg@bowt.ie>) |
| Responses |
Re: index prefetching
|
| List | pgsql-hackers |
On 2/17/26 21:16, Peter Geoghegan wrote: > On Tue, Feb 17, 2026 at 2:27 PM Andres Freund <andres@anarazel.de> wrote: >> On 2026-02-17 12:16:23 -0500, Peter Geoghegan wrote: >>> On Mon, Feb 16, 2026 at 11:48 AM Andres Freund <andres@anarazel.de> wrote: >>> I agree that the current heuristics (which were invented recently) are >>> too conservative. I overfit the heuristics to my current set of >>> adversarial queries, as a stopgap measure. >> >> Are you doing any testing on higher latency storage? I found it to be quite >> valuable to use dm_delay to have a disk with reproducible (i.e. not cloud) >> higher latency (i.e. not just a local SSD). > > I sometimes use dm_delay (with the minimum 1ms delay) when testing, > but don't do so regularly. Just because it's inconvenient to do so > (perhaps not a great reason). > >> Low latency NVMe can reduce the >> penalty of not enough readahead so much that it's hard to spot problems... > > I'll keep that in mind. > So, what counts as "higher latency" in this context? What delays should we consider practical/relevant for testing? >>> ISTM that we need the yields to better cooperate with whatever's >>> happening on the read stream side. >> >> Plausible. It could be that we could get away with controlling the rampup to >> be slower in potentially problematic cases, without needing the yielding, but >> not sure. >> >> If that doesn't work, it might just be sufficient to increase the number of >> batches that trigger yields as the scan goes on (perhaps by taking the number >> of already "consumed" batches into account). > > It could make sense to take the number of consumed batches into > account. In general, I think the best approach will be one that > combines multiple complementary strategies. > Yes, this is roughly what I meant by "ramp up". Start by limiting the batch distance to 2, then gradually increase that during the scan. > Passing down a LIMIT N hint has proven to be a good idea -- and it > doesn't really require applying any information related to the read > stream. That's enough to prevent problems in the really extreme cases > (e.g., nested loop antijoins with a LIMIT 1 on the inner side). The > problematic merge join I showed you is a not-so-extreme case, which > makes it trickier. ISTM that taking into consideration the number of > "consumed" batches will not help that particular merge join query, > precisely because it's not-so-extreme: the inner index scan consumes > plenty of batches, but is nevertheless significantly regressed (at > least when we don't yield at all). > >> To evaluate the amount of wasted work, it could be useful to make the read >> stream stats page spit out the amount of "unconsumed" IOs at the end of the >> scan. > > That would make sense. You can already tell when that's happened by > comparing the details shown by EXPLAIN ANALYZE against the same query > execution on master, but that approach is inconvenient. Automating my > microbenchmarks has proven to be important with this project. There's > quite a few competing considerations, and it's too easy to improve one > query at the cost of regressing another. > What counts as "unconsumed IO"? The IOs the stream already started, but then did not consume? That shouldn't be hard, I think. regards -- Tomas Vondra
pgsql-hackers by date: