Re: index prefetching - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: index prefetching |
Date | |
Msg-id | CAH2-WzkWNtCRTcUajGYrCkp9-+btteAthg21BzxbKV09AJuSrA@mail.gmail.com Whole thread Raw |
In response to | Re: index prefetching (Andres Freund <andres@anarazel.de>) |
Responses |
Re: index prefetching
|
List | pgsql-hackers |
On Thu, Aug 14, 2025 at 4:44 PM Andres Freund <andres@anarazel.de> wrote: > Interesting. In the sequential case I see some waits that are not attributed > in explain, due to the waits happening within WaitIO(), not WaitReadBuffers(). > Which indicates that the read stream is trying to re-read a buffer that > previously started being read. I *knew* that something had to be up here. Thanks for your help with debugging! > read_stream_start_pending_read() > -> StartReadBuffers() > -> AsyncReadBuffers() > -> ReadBuffersCanStartIO() > -> StartBufferIO() > -> WaitIO() > > There are far fewer cases of this in the random case. Index tuples with TIDs that are slightly out of order are very normal. Even for *perfectly* sequential inserts, the FSM tends to use the last piece of free space on a heap page some time after the heap page initially becomes "almost full". I recently described this to Tomas on this thread [1]. > From what I can tell the sequential case so often will re-read a buffer that > it is already in the process of reading - and thus wait for that IO before > continuing - that we don't actually keep enough IO in flight. Oops. There is an existing stop-gap mechanism in the patch that is supposed to deal with this problem. index_scan_stream_read_next, which is the read stream callback, has logic that is supposed to suppress duplicate block requests. But that's obviously not totally effective, since it only remembers the very last heap block request. If this same mechanism remembered (say) the last 2 heap blocks it requested, that might be enough to totally fix this particular problem. This isn't a serious proposal, but it'll be simple enough to implement. Hopefully when I do that (which I plan to soon) it'll fully validate your theory. > We can optimize that by deferring the StartBufferIO() if we're encountering a > buffer that is undergoing IO, at the cost of some complexity. I'm not sure > real-world queries will often encounter the pattern of the same block being > read in by a read stream multiple times in close proximity sufficiently often > to make that worth it. We definitely need to be prepared for duplicate prefetch requests in the context of index scans. I'm far from sure how sophisticated that actually needs to be. Obviously the design choices in this area are far from settled right now. [1] DC1G2PKUO9CI.3MK1L3YBZ2V3T@bowt.ie -- Peter Geoghegan
pgsql-hackers by date: