Re: index prefetching - Mailing list pgsql-hackers
From | Konstantin Knizhnik |
---|---|
Subject | Re: index prefetching |
Date | |
Msg-id | 2a4024ec-d12c-449c-97cd-a64e77dc7067@garret.ru Whole thread Raw |
In response to | Re: index prefetching (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Responses |
Re: index prefetching
Re: index prefetching |
List | pgsql-hackers |
On 16/01/2024 6:25 pm, Tomas Vondra wrote:
On 1/16/24 09:13, Konstantin Knizhnik wrote:Hi, On 12/01/2024 6:42 pm, Tomas Vondra wrote:Hi, Here's an improved version of this patch, finishing a lot of the stuff that I alluded to earlier - moving the code from indexam.c, renaming a bunch of stuff, etc. I've also squashed it into a single patch, to make it easier to review.I am thinking about testing you patch with Neon (cloud Postgres). As far as Neon seaprates compute and storage, prefetch is much more critical for Neon architecture than for vanilla Postgres. I have few complaints: 1. It disables prefetch for sequential access pattern (i.e. INDEX MERGE), motivating it that in this case OS read-ahead will be more efficient than prefetch. It may be true for normal storage devices, bit not for Neon storage and may be also for Postgres on top of DFS (i.e. Amazon RDS). I wonder if we can delegate decision whether to perform prefetch in this case or not to some other level. I do not know precisely where is should be handled. The best candidate IMHO is storager manager. But it most likely requires extension of SMGR API. Not sure if you want to do it... Straightforward solution is to move this logic to some callback, which can be overwritten by user.Interesting point. You're right these decisions (whether to prefetch particular patterns) are closely tied to the capabilities of the storage system. So it might make sense to maybe define it at that level. Not sure what exactly RDS does with the storage - my understanding is that it's mostly regular Postgres code, but managed by Amazon. So how would that modify the prefetching logic?
Amazon RDS is just vanilla Postgres with file system mounted on EBS (Amazon distributed file system).
EBS provides good throughput but larger latencies comparing with local SSDs.
I am not sure if read-ahead works for EBS.
4. I think that performing prefetch at executor level is really greatidea and so prefetch can be used by all indexes, including custom indexes. But prefetch will be efficient only if index can provide fast access to next TID (located at the same page). I am not sure that it is true for all builtin indexes (GIN, GIST, BRIN,...) and especially for custom AM. I wonder if we should extend AM API to make index make a decision weather to perform prefetch of TIDs or not.I'm not against having a flag to enable/disable prefetching, but the question is whether doing prefetching for such indexes can be harmful. I'm not sure about that.
I tend to agree with you - it is hard to imagine index implementation which doesn't win from prefetching heap pages.
May be only the filtering case you have mentioned. But it seems to me that current B-Tree index scan (not IOS) implementation in Postgres
doesn't try to use index tuple to check extra condition - it will fetch heap tuple in any case.
5. Minor notice: there are few places where index_getnext_slot is called with last NULL parameter (disabled prefetch) with the following comment "XXX Would be nice to also benefit from prefetching here." But all this places corresponds to "point loopkup", i.e. unique constraint check, find replication tuple by index... Prefetch seems to be unlikely useful here, unlkess there is index bloating and and we have to skip a lot of tuples before locating right one. But should we try to optimize case of bloated indexes?Are you sure you're looking at the last patch version? Because the current patch does not have any new parameters in index_getnext_* and the comments were removed too (I suppose you're talking about execIndexing, execReplication and those places).
Sorry, I looked at v20240103-0001-prefetch-2023-12-09.patch , I didn't noticed v20240112-0001-Prefetch-heap-pages-during-index-scans.patch
regards
pgsql-hackers by date: