Re: PoC: prefetching index leaf pages (for inserts) - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: PoC: prefetching index leaf pages (for inserts) |
Date | |
Msg-id | 1ee3d0e5-da72-3eb0-a024-c3e862625b5e@enterprisedb.com Whole thread Raw |
In response to | Re: PoC: prefetching index leaf pages (for inserts) (Heikki Linnakangas <hlinnaka@iki.fi>) |
List | pgsql-hackers |
On 11/23/23 14:26, Heikki Linnakangas wrote: > On 06/11/2023 19:05, Tomas Vondra wrote: >> As for the path forward, I think the prefetching is demonstrably >> beneficial. There are cases where it can't help or even harms >> performance. I think the success depends on three areas: >> >> (a) reducing the costs of the prefetching - For example right now we >> build the index tuples twice (once for prefetch, once for the insert), >> but maybe there's a way to do that only once? There are also predicate >> indexes, and so on. >> >> (b) being smarter about when to prefetch - For example if we only have >> one "prefetchable" index, it's somewhat pointless to prefetch (for >> single-row cases). And so on. >> >> (c) not prefetching when already cached - This is somewhat related to >> the previous case, but perhaps it'd be cheaper to first check if the >> data is already cached. For shared buffers it should not be difficult, >> for page cache we could use preadv2 with RWF_NOWAIT flag. The question >> is if this is cheap enough to be cheaper than just doing posix_fadvise >> (which however only deals with shared buffers). > > I don't like this approach. It duplicates the tree-descend code, and it > also duplicates the work of descending the tree at runtime. And it only > addresses index insertion; there are a lot of places that could benefit > from prefetching or async execution like this. > Yeah, I think that's a fair assessment, although I think the amount of duplicate code is pretty small (and perhaps it could be refactored to a common function, which I chose not to do in the PoC patch). > I think we should think of this as async execution rather than > prefetching. We don't have the general infrastructure for writing async > code, but if we did, this would be much simpler. In an async programming > model, like you have in many other languages like Rust, python or > javascript, there would be no separate prefetching function. Instead, > aminsert() would return a future that can pause execution if it needs to > do I/O. Something like this: > > aminsert_futures = NIL; > /* create a future for each index insert */ > for (<all indexes>) > { > aminsert_futures = lappend(aminsert_futures, aminsert(...)); > } > /* wait for all the futures to finish */ > await aminsert_futures; > > The async-aware aminsert function would run to completion quickly if all > the pages are already in cache. If you get a cache miss, it would start > an async I/O read for the page, and yield to the other insertions until > the I/O completes. > > We already support async execution of FDWs now, with the > ForeignAsyncRequest() and ForeignAsyncConfigureWait() callbacks. Can we > generalize that? > Interesting idea. I kinda like it in principle, however I'm not very familiar with how our async execution works (and perhaps even with async implementations in general), so I can't quite say how difficult would it be to do something like that in an AM (instead of an executor). Where exactly would be the boundary between who "creates" and "executes" the requests, what would be the flow of execution? For the FDW it seems fairly straightforward, because the boundary is local/remote, and the async request is executed in a separate process. But here everything would happen locally, so how would that work? Imagine we're inserting a tuple into two indexes. There's a bunch of index pages we may need to read for each index - first some internal pages, then some leafs. Presumably we'd want to do each page read as asynchronous, and allow transfer of control to the other index. IIUC the async-aware aminsert() would execute an async request for the first page it needs to read, with a callback that essentially does the next step of index descent - reads the page, determines the next page to read, and then do another async request. Then it'd sleep, which would allow transfer of control to the aminsert() on the other index. And then we'd do a similar thing for the leaf pages. Or do I imagine things wrong? The thing I like about this async approach is that it would allow prefetching all index pages, while my PoC patch simply assumes all internal pages are in cache and prefetches only the first leaf page. That's much simpler in terms of control flow, but has clear limits. I however wonder if there are concurrency issues. Imagine there's a COPY, i.e. we're inserting a batch of tuples. Can you run the aminsert() for all the tuples concurrently? Won't that have issues with the different "async threads" modifying the index for the other threads? If those concurrent "insert threads" would be an issue, maybe we could make amprefetch() async-aware ... regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: