Re: WIP: WAL prefetch (another approach) - Mailing list pgsql-hackers
From | Dmitry Dolgov |
---|---|
Subject | Re: WIP: WAL prefetch (another approach) |
Date | |
Msg-id | 20200419114820.h62ryte7uwu46e2x@localhost Whole thread Raw |
In response to | Re: WIP: WAL prefetch (another approach) (Thomas Munro <thomas.munro@gmail.com>) |
Responses |
Re: WIP: WAL prefetch (another approach)
|
List | pgsql-hackers |
> On Thu, Apr 09, 2020 at 09:55:25AM +1200, Thomas Munro wrote: > Thanks. Here's a rebase. Thanks for working on this patch, it seems like a great feature. I'm probably a bit late to the party, but still want to make couple of commentaries. The patch indeed looks good, I couldn't find any significant issues so far and almost all my questions I had while reading it were actually answered in this thread. I'm still busy with benchmarking, mostly to see how prefetching would work with different workload distributions and how much the kernel will actually prefetch. In the meantime I have a few questions: > On Wed, Feb 12, 2020 at 07:52:42PM +1300, Thomas Munro wrote: > > On Fri, Jan 3, 2020 at 7:10 AM Tomas Vondra > > <tomas.vondra@2ndquadrant.com> wrote: > > > Could we instead specify the number of blocks to prefetch? We'd probably > > > need to track additional details needed to determine number of blocks to > > > prefetch (essentially LSN for all prefetch requests). > > Here is a new WIP version of the patch set that does that. Changes: > > 1. It now uses effective_io_concurrency to control how many > concurrent prefetches to allow. It's possible that we should have a > different GUC to control "maintenance" users of concurrency I/O as > discussed elsewhere[1], but I'm staying out of that for now; if we > agree to do that for VACUUM etc, we can change it easily here. Note > that the value is percolated through the ComputeIoConcurrency() > function which I think we should discuss, but again that's off topic, > I just want to use the standard infrastructure here. This totally makes sense, I believe the question "how much to prefetch" eventually depends equally on a type of workload (correlates with how far in WAL to read) and how much resources are available for prefetching (correlates with queue depth). But in the documentation it looks like maintenance-io-concurrency is just an "unimportant" option, and I'm almost sure will be overlooked by many readers: The maximum distance to look ahead in the WAL during recovery, to find blocks to prefetch. Prefetching blocks that will soon be needed can reduce I/O wait times. The number of concurrent prefetches is limited by this setting as well as <xref linkend="guc-maintenance-io-concurrency"/>. Setting it too high might be counterproductive, if it means that data falls out of the kernel cache before it is needed. If this value is specified without units, it is taken as bytes. A setting of -1 disables prefetching during recovery. Maybe it makes also sense to emphasize that maintenance-io-concurrency directly affects resource consumption and it's a "primary control"? > On Wed, Mar 18, 2020 at 06:18:44PM +1300, Thomas Munro wrote: > > Here's a new version that changes that part just a bit more, after a > brief chat with Andres about his async I/O plans. It seems clear that > returning an enum isn't very extensible, so I decided to try making > PrefetchBufferResult a struct whose contents can be extended in the > future. In this patch set it's still just used to distinguish 3 cases > (hit, miss, no file), but it's now expressed as a buffer and a flag to > indicate whether I/O was initiated. You could imagine that the second > thing might be replaced by a pointer to an async I/O handle you can > wait on or some other magical thing from the future. I like the idea of extensible PrefetchBufferResult. Just one commentary, if I understand correctly the way how it is being used together with prefetch_queue assumes one IO operation at a time. This limits potential extension of the underlying code, e.g. one can't implement some sort of buffering of requests and submitting an iovec to a sycall, then prefetch_queue will no longer correctly represent inflight IO. Also, taking into account that "we don't have any awareness of when I/O really completes", maybe in the future it makes to reconsider having queue in the prefetcher itself and rather ask for this information from the underlying code? > On Wed, Apr 08, 2020 at 04:24:21AM +1200, Thomas Munro wrote: > > Is there a way we could have a "historical" version of at least some of > > these? An average queue depth, or such? > > Ok, I added simple online averages for distance and queue depth that > take a sample every time recovery advances by 256kB. Maybe it was discussed in the past in other threads. But if I understand correctly, this implementation weights all the samples. Since at the moment it depends directly on replaying speed (so a lot of IO involved), couldn't it lead to a single outlier at the beginning skewing this value and make it less useful? Does it make sense to decay old values?
pgsql-hackers by date: