Home > mailing lists

Re: index prefetching - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: index prefetching
Date	December 21, 2023 18:20:45
Msg-id	afd17f3c-d496-5583-e93a-5a162d37235f@enterprisedb.com Whole thread Raw
In response to	Re: index prefetching (Andres Freund <andres@anarazel.de>)
Responses	Re: index prefetching
List	pgsql-hackers

Tree view


On 12/21/23 14:43, Andres Freund wrote:
> Hi,
> 
> On 2023-12-21 13:30:42 +0100, Tomas Vondra wrote:
>> You're right a lot of this is a guesswork. I don't think we can do much
>> better, because it depends on stuff that's out of our control - each OS
>> may do things differently, or perhaps it's just configured differently.
>>
>> But I don't think this is really a serious issue - all the read-ahead
>> implementations need to work about the same, because they are meant to
>> work in a transparent way.
>>
>> So it's about deciding at which point we think this is a sequential
>> pattern. Yes, the OS may use a slightly different threshold, but the
>> exact value does not really matter - in the worst case we prefetch a
>> couple more/fewer blocks.
>>
>> The OS read-ahead can't really prefetch anything except sequential
>> cases, so the whole question is "When does the access pattern get
>> sequential enough?". I don't think there's a perfect answer, and I don't
>> think we need a perfect one - we just need to be reasonably close.
> 
> For the streaming read interface (initially backed by fadvise, to then be
> replaced by AIO) we found that it's clearly necessary to avoid fadvises in
> cases of actual sequential IO - the overhead otherwise leads to easily
> reproducible regressions.  So I don't think we have much choice.
> 

Yeah, the regression are pretty easy to demonstrate. In fact, I didn't
have such detection in the first patch, but after the first round of
benchmarks it became obvious it's needed.

> 
>> Also, while I don't want to lazily dismiss valid cases that might be
>> affected by this, I think that sequential access for index paths is not
>> that common (with the exception of clustered indexes).
> 
> I think sequential access is common in other cases as well. There's lots of
> indexes where heap tids are almost perfectly correlated with index entries,
> consider insert only insert-only tables and serial PKs or inserted_at
> timestamp columns.  Even leaving those aside, for indexes with many entries
> for the same key, we sort by tid these days, which will also result in
> "runs" of sequential access.
> 

True. I should have thought about those cases.

> 
>> Obviously, the latter case has much more severe impact, but it depends
>> on the exact workload / access pattern etc. The only "perfect" solution
>> would be to actually check the page cache, but well - that seems to be
>> fairly expensive.
> 
>> What I was envisioning was something self-tuning, based on the I/O we
>> may do later. If the prefetcher decides to prefetch something, but finds
>> it's already in cache, we'd increase the distance, to remember more
>> blocks. Likewise, if a block is not prefetched but then requires I/O
>> later, decrease the distance. That'd make it adaptive, but I don't think
>> we actually have the info about I/O.
> 
> How would the prefetcher know that hte data wasn't in cache?
> 

I don't think there's a good way to do that, unfortunately, or at least
I'm not aware of it. That's what I meant by "we don't have the info" at
the end. Which is why I haven't tried implementing it.

The only "solution" I could come up with was some sort of "timing" for
the I/O requests and deducing what was cached. Not great, of course.

> 
>> Alternatively, I was thinking about moving the prefetches into a
>> separate worker process (or multiple workers), so we'd just queue the
>> request and all the overhead would be done by the worker. The main
>> problem is the overhead of calling posix_fadvise() for blocks that are
>> already in memory, and this would just move it to a separate backend. I
>> wonder if that might even make the custom cache unnecessary / optional.
> 
> The AIO patchset provides this.
> 

OK, I guess it's time for me to take a look at the patch again.

> 
>> AFAICS this seems similar to some of the AIO patch, I wonder what that
>> plans to do. I need to check.
> 
> Yes, most of this exists there.  The difference that with the AIO you don't
> need to prefetch, as you can just initiate the IO for real, and wait for it to
> complete.
> 

Right, although the line where things stop being "prefetch" and becomes
"async" seems a bit unclear to me / perhaps more a point of view.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: Alexander Lakhin
Date: 21 December 2023, 18:00:00
Subject: Re: trying again to get incremental backup

From: Frédéric Yhuel
Date: 21 December 2023, 18:21:26
Subject: Re: Set log_lock_waits=on by default

Re: index prefetching - Mailing list pgsql-hackers

Previous

Next