Re: index prefetching - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: index prefetching |
Date | |
Msg-id | 80de9927-539a-448f-a299-013edaede283@vondra.me Whole thread Raw |
In response to | Re: index prefetching (Andres Freund <andres@anarazel.de>) |
Responses |
Re: index prefetching
|
List | pgsql-hackers |
On 8/13/25 23:37, Andres Freund wrote: > Hi, > > On 2025-08-13 23:07:07 +0200, Tomas Vondra wrote: >> On 8/13/25 16:44, Andres Freund wrote: >>> On 2025-08-13 14:15:37 +0200, Tomas Vondra wrote: >>>> In fact, I believe this is about io_method. I initially didn't see the >>>> difference you described, and then I realized I set io_method=sync to >>>> make it easier to track the block access. And if I change io_method to >>>> worker, I get different stats, that also change between runs. >>>> >>>> With "sync" I always get this (after a restart): >>>> >>>> Buffers: shared hit=7435 read=52801 >>>> >>>> while with "worker" I get this: >>>> >>>> Buffers: shared hit=4879 read=52801 >>>> Buffers: shared hit=5151 read=52801 >>>> Buffers: shared hit=4978 read=52801 >>>> >>>> So not only it changes run to tun, it also does not add up to 60236. >>> >>> This is reproducible on master? If so, how? >>> >>> >>>> I vaguely recall I ran into this some time ago during AIO benchmarking, >>>> and IIRC it's due to how StartReadBuffersImpl() may behave differently >>>> depending on I/O started earlier. It only calls PinBufferForBlock() in >>>> some cases, and PinBufferForBlock() is what updates the hits. >>> >>> Hm, I don't immediately see an issue there. The only case we don't call >>> PinBufferForBlock() is if we already have pinned the relevant buffer in a >>> prior call to StartReadBuffersImpl(). >>> >>> >>> If this happens only with the prefetching patch applied, is is possible that >>> what happens here is that we occasionally re-request buffers that already in >>> the process of being read in? That would only happen with a read stream and >>> io_method != sync (since with sync we won't read ahead). If we have to start >>> reading in a buffer that's already undergoing IO we wait for the IO to >>> complete and count that access as a hit: >>> >>> /* >>> * Check if we can start IO on the first to-be-read buffer. >>> * >>> * If an I/O is already in progress in another backend, we want to wait >>> * for the outcome: either done, or something went wrong and we will >>> * retry. >>> */ >>> if (!ReadBuffersCanStartIO(buffers[nblocks_done], false)) >>> { >>> ... >>> /* >>> * Report and track this as a 'hit' for this backend, even though it >>> * must have started out as a miss in PinBufferForBlock(). The other >>> * backend will track this as a 'read'. >>> */ >>> ... >>> if (persistence == RELPERSISTENCE_TEMP) >>> pgBufferUsage.local_blks_hit += 1; >>> else >>> pgBufferUsage.shared_blks_hit += 1; >>> ... >>> >>> >> >> I think it has to be this. It only happens with io_method != sync, and >> only with effective_io_concurrency > 1. At first I was wondering why I >> can't reproduce this for seqscan/bitmapscan, but then I realized those >> plans never visit the same block repeatedly - indexscans do that. It's >> also not surprising it's timing-sensitive, as it likely depends on how >> fast the worker happens to start/complete requests. >> >> What would be a good way to "prove" it really is this? > > I'd just comment out those stats increments and then check if the stats are > stable afterwards. > I tried that, but it's not enough - the buffer hits gets lower, but remains variable. It stabilizes only if I comment out the increment in PinBufferForBlock() too. At which point it gets to 0, of course ... regards -- Tomas Vondra
pgsql-hackers by date: