Re: index prefetching - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: index prefetching |
Date | |
Msg-id | CAH2-Wzk+ba5oHf1ghC34Z8eoCvum5yHGMxHeFCbC2ibcQ+J7uw@mail.gmail.com Whole thread Raw |
In response to | Re: index prefetching (Tomas Vondra <tomas@vondra.me>) |
Responses |
Re: index prefetching
Re: index prefetching Re: index prefetching |
List | pgsql-hackers |
On Tue, Jul 22, 2025 at 9:06 AM Tomas Vondra <tomas@vondra.me> wrote: > Real workloads are likely to have multiple misses in a row, which indeed > ramps up the distance quickly. So maybe it's not that bad. Could we > track a longer history of hits/misses, and consider that when adjusting > the distance? Not just the most recent hit/miss? +1 > FWIW I re-ran the index-prefetch-test benchmarks with restoring the > distance for the "simple" patch. The results are in the same github > repository, in a separate branch: > > https://github.com/tvondra/indexscan-prefetch-tests/tree/with-distance-restore-after-reset These results make way more sense. There was absolutely no reason why the "simple" patch should have done so much worse than the "complex" one for most of the tests you've been running. Obviously, whatever advantage that the "complex" patch has is bound to be limited to cases where index characteristics are naturally the limiting factor. For example, with the pgbench_accounts_pkey table there are only ever 6 distinct heap blocks on each leaf page. I bet that your "linear" test more or less looks like that, too. I attach pgbench_accounts_pkey_nhblks.txt, which shows a query that (among other things) ouputs "nhblks" for each leaf page from a given index (while showing the details of each leaf page in index key space order). It also shows results for pgbench_accounts_pkey with pgbench scale 1. This is how I determined that every pgbench_accounts_pkey leaf page points to exactly 6 distinct heap blocks -- "nhblks" is always 6. Note that this is what I see regardless of the pgbench scale, indicating that things always perfectly line up (even more than I would expect for very synthetic data such as this). This query is unwieldy when run against larger indexes, but that shouldn't be necessary. As with pgbench_accounts_pkey, it's typical for synthetically generated data to have a very consistent "nhblks", regardless of the total amount of data. With your "uniform" test cases, I'd expect this query to show "nhtids == nhblks" (or very close to it), which of course makes our ability to eagerly read further leaf pages almost irrelevant. If there are hundreds of distinct heap blocks on each leaf page, but effective_io_concurrency is 16 (or even 64), there's little we can do about it. > I'm attaching two PDFs with results for eic=16 (linear and log-scale, to > compare timings for quick queries). This shows that with restoring > distance after reset, the simple patch is pretty much the same as the > complex patch. > > The only data set where that's not the case is the "linear" data set, > when everything is perfectly sequential. In this case the simple patch > performs like "master" (i.e. no prefetching). I'm not sure why is that. Did you restore the distance for the "complex" patch, too? I think that it might well matter there too. Isn't the obvious explanation that the complex patch benefits from being able to prefetch without being limited by index characteristics/leaf page boundaries, while the simple patch doesn't? > Anyway, it seems to confirm most of the differences between the two > patches is due to the "distance collapse". The impact of the resets in > the first benchmarks surprised me quite a bit, but if we don't ramp up > the distance that makes perfect sense. > > The issue probably affects other queries that do a lot of resets. Index > scan prefetching just makes it very obvious. What is the difference between cases like "linear / eic=16 / sync" and "linear_1 / eic=16 / sync"? One would imagine that these tests are very similar, based on the fact that they have very similar names. But we see very different results for each: with the former ("linear") test results, the "complex" patch is 2x-4x faster than the "simple" patch. But, with the latter test results ("linear_1", and other similar pairs of "linear_N" tests) the advantage for the "complex" patch *completely* evaporates. I find that very suspicious, and wonder if it might be due to a bug/inefficiency that could easily be fixed (possibly an issue on the read stream side, like the one you mentioned to Nazir just now). -- Peter Geoghegan
Attachment
pgsql-hackers by date: