Home > mailing lists

Re: index prefetching - Mailing list pgsql-hackers

From	Peter Geoghegan
Subject	Re: index prefetching
Date	August 15 00:06:07
Msg-id	CAH2-WzkWNtCRTcUajGYrCkp9-+btteAthg21BzxbKV09AJuSrA@mail.gmail.com Whole thread Raw
In response to	Re: index prefetching (Andres Freund <andres@anarazel.de>)
Responses	Re: index prefetching
List	pgsql-hackers

Tree view

On Thu, Aug 14, 2025 at 4:44 PM Andres Freund <andres@anarazel.de> wrote:
> Interesting. In the sequential case I see some waits that are not attributed
> in explain, due to the waits happening within WaitIO(), not WaitReadBuffers().
> Which indicates that the read stream is trying to re-read a buffer that
> previously started being read.

I *knew* that something had to be up here. Thanks for your help with debugging!

>    read_stream_start_pending_read()
> -> StartReadBuffers()
> -> AsyncReadBuffers()
> -> ReadBuffersCanStartIO()
> -> StartBufferIO()
> -> WaitIO()
>
> There are far fewer cases of this in the random case.

Index tuples with TIDs that are slightly out of order are very normal.
Even for *perfectly* sequential inserts, the FSM tends to use the last
piece of free space on a heap page some time after the heap page
initially becomes "almost full". I recently described this to Tomas on
this thread [1].

> From what I can tell the sequential case so often will re-read a buffer that
> it is already in the process of reading - and thus wait for that IO before
> continuing - that we don't actually keep enough IO in flight.

Oops.

There is an existing stop-gap mechanism in the patch that is supposed
to deal with this problem. index_scan_stream_read_next, which is the
read stream callback, has logic that is supposed to suppress duplicate
block requests. But that's obviously not totally effective, since it
only remembers the very last heap block request.

If this same mechanism remembered (say) the last 2 heap blocks it
requested, that might be enough to totally fix this particular
problem. This isn't a serious proposal, but it'll be simple enough to
implement. Hopefully when I do that (which I plan to soon) it'll fully
validate your theory.

> We can optimize that by deferring the StartBufferIO() if we're encountering a
> buffer that is undergoing IO, at the cost of some complexity.  I'm not sure
> real-world queries will often encounter the pattern of the same block being
> read in by a read stream multiple times in close proximity sufficiently often
> to make that worth it.

We definitely need to be prepared for duplicate prefetch requests in
the context of index scans. I'm far from sure how sophisticated that
actually needs to be. Obviously the design choices in this area are
far from settled right now.

[1] DC1G2PKUO9CI.3MK1L3YBZ2V3T@bowt.ie
--
Peter Geoghegan

pgsql-hackers by date:

From: Andres Freund
Date: 14 August, 23:44:14
Subject: Re: index prefetching

From: Peter Geoghegan
Date: 15 August, 00:55:53
Subject: Re: index prefetching

Re: index prefetching - Mailing list pgsql-hackers

Previous

Next