Re: index prefetching - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: index prefetching |
Date | |
Msg-id | afd17f3c-d496-5583-e93a-5a162d37235f@enterprisedb.com Whole thread Raw |
In response to | Re: index prefetching (Andres Freund <andres@anarazel.de>) |
Responses |
Re: index prefetching
|
List | pgsql-hackers |
On 12/21/23 14:43, Andres Freund wrote: > Hi, > > On 2023-12-21 13:30:42 +0100, Tomas Vondra wrote: >> You're right a lot of this is a guesswork. I don't think we can do much >> better, because it depends on stuff that's out of our control - each OS >> may do things differently, or perhaps it's just configured differently. >> >> But I don't think this is really a serious issue - all the read-ahead >> implementations need to work about the same, because they are meant to >> work in a transparent way. >> >> So it's about deciding at which point we think this is a sequential >> pattern. Yes, the OS may use a slightly different threshold, but the >> exact value does not really matter - in the worst case we prefetch a >> couple more/fewer blocks. >> >> The OS read-ahead can't really prefetch anything except sequential >> cases, so the whole question is "When does the access pattern get >> sequential enough?". I don't think there's a perfect answer, and I don't >> think we need a perfect one - we just need to be reasonably close. > > For the streaming read interface (initially backed by fadvise, to then be > replaced by AIO) we found that it's clearly necessary to avoid fadvises in > cases of actual sequential IO - the overhead otherwise leads to easily > reproducible regressions. So I don't think we have much choice. > Yeah, the regression are pretty easy to demonstrate. In fact, I didn't have such detection in the first patch, but after the first round of benchmarks it became obvious it's needed. > >> Also, while I don't want to lazily dismiss valid cases that might be >> affected by this, I think that sequential access for index paths is not >> that common (with the exception of clustered indexes). > > I think sequential access is common in other cases as well. There's lots of > indexes where heap tids are almost perfectly correlated with index entries, > consider insert only insert-only tables and serial PKs or inserted_at > timestamp columns. Even leaving those aside, for indexes with many entries > for the same key, we sort by tid these days, which will also result in > "runs" of sequential access. > True. I should have thought about those cases. > >> Obviously, the latter case has much more severe impact, but it depends >> on the exact workload / access pattern etc. The only "perfect" solution >> would be to actually check the page cache, but well - that seems to be >> fairly expensive. > >> What I was envisioning was something self-tuning, based on the I/O we >> may do later. If the prefetcher decides to prefetch something, but finds >> it's already in cache, we'd increase the distance, to remember more >> blocks. Likewise, if a block is not prefetched but then requires I/O >> later, decrease the distance. That'd make it adaptive, but I don't think >> we actually have the info about I/O. > > How would the prefetcher know that hte data wasn't in cache? > I don't think there's a good way to do that, unfortunately, or at least I'm not aware of it. That's what I meant by "we don't have the info" at the end. Which is why I haven't tried implementing it. The only "solution" I could come up with was some sort of "timing" for the I/O requests and deducing what was cached. Not great, of course. > >> Alternatively, I was thinking about moving the prefetches into a >> separate worker process (or multiple workers), so we'd just queue the >> request and all the overhead would be done by the worker. The main >> problem is the overhead of calling posix_fadvise() for blocks that are >> already in memory, and this would just move it to a separate backend. I >> wonder if that might even make the custom cache unnecessary / optional. > > The AIO patchset provides this. > OK, I guess it's time for me to take a look at the patch again. > >> AFAICS this seems similar to some of the AIO patch, I wonder what that >> plans to do. I need to check. > > Yes, most of this exists there. The difference that with the AIO you don't > need to prefetch, as you can just initiate the IO for real, and wait for it to > complete. > Right, although the line where things stop being "prefetch" and becomes "async" seems a bit unclear to me / perhaps more a point of view. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: