Home > mailing lists

Re: index prefetching - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: index prefetching
Date	August 14, 2025 00:37:32
Msg-id	adl4xbynhs72cuiqbebqkgmtyb6s3zdqhoazwjo63pjrdjupwy@mxcgidbduthw Whole thread
In response to	Re: index prefetching (Tomas Vondra <tomas@vondra.me>)
Responses	Re: index prefetching
List	pgsql-hackers

Tree view

Hi,

On 2025-08-13 23:07:07 +0200, Tomas Vondra wrote:
> On 8/13/25 16:44, Andres Freund wrote:
> > On 2025-08-13 14:15:37 +0200, Tomas Vondra wrote:
> >> In fact, I believe this is about io_method. I initially didn't see the
> >> difference you described, and then I realized I set io_method=sync to
> >> make it easier to track the block access. And if I change io_method to
> >> worker, I get different stats, that also change between runs.
> >>
> >> With "sync" I always get this (after a restart):
> >>
> >>    Buffers: shared hit=7435 read=52801
> >>
> >> while with "worker" I get this:
> >>
> >>    Buffers: shared hit=4879 read=52801
> >>    Buffers: shared hit=5151 read=52801
> >>    Buffers: shared hit=4978 read=52801
> >>
> >> So not only it changes run to tun, it also does not add up to 60236.
> > 
> > This is reproducible on master? If so, how?
> > 
> > 
> >> I vaguely recall I ran into this some time ago during AIO benchmarking,
> >> and IIRC it's due to how StartReadBuffersImpl() may behave differently
> >> depending on I/O started earlier. It only calls PinBufferForBlock() in
> >> some cases, and PinBufferForBlock() is what updates the hits.
> > 
> > Hm, I don't immediately see an issue there. The only case we don't call
> > PinBufferForBlock() is if we already have pinned the relevant buffer in a
> > prior call to StartReadBuffersImpl().
> > 
> > 
> > If this happens only with the prefetching patch applied, is is possible that
> > what happens here is that we occasionally re-request buffers that already in
> > the process of being read in? That would only happen with a read stream and
> > io_method != sync (since with sync we won't read ahead). If we have to start
> > reading in a buffer that's already undergoing IO we wait for the IO to
> > complete and count that access as a hit:
> > 
> >     /*
> >      * Check if we can start IO on the first to-be-read buffer.
> >      *
> >      * If an I/O is already in progress in another backend, we want to wait
> >      * for the outcome: either done, or something went wrong and we will
> >      * retry.
> >      */
> >     if (!ReadBuffersCanStartIO(buffers[nblocks_done], false))
> >     {
> > ...
> >         /*
> >          * Report and track this as a 'hit' for this backend, even though it
> >          * must have started out as a miss in PinBufferForBlock(). The other
> >          * backend will track this as a 'read'.
> >          */
> > ...
> >         if (persistence == RELPERSISTENCE_TEMP)
> >             pgBufferUsage.local_blks_hit += 1;
> >         else
> >             pgBufferUsage.shared_blks_hit += 1;
> > ...
> > 
> > 
> 
> I think it has to be this. It only happens with io_method != sync, and
> only with effective_io_concurrency > 1. At first I was wondering why I
> can't reproduce this for seqscan/bitmapscan, but then I realized those
> plans never visit the same block repeatedly - indexscans do that. It's
> also not surprising it's timing-sensitive, as it likely depends on how
> fast the worker happens to start/complete requests.
> 
> What would be a good way to "prove" it really is this?

I'd just comment out those stats increments and then check if the stats are
stable afterwards.

Greetings,

Andres Freund

pgsql-hackers by date:

From: Peter Geoghegan
Date: 14 August 2025, 00:36:24
Subject: Re: index prefetching

From: Peter Geoghegan
Date: 14 August 2025, 00:57:00
Subject: Re: index prefetching

Re: index prefetching - Mailing list pgsql-hackers

Previous

Next