Re: WIP: WAL prefetch (another approach) - Mailing list pgsql-hackers

From Dmitry Dolgov
Subject Re: WIP: WAL prefetch (another approach)
Date
Msg-id 20200419114820.h62ryte7uwu46e2x@localhost
Whole thread Raw
In response to Re: WIP: WAL prefetch (another approach)  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: WIP: WAL prefetch (another approach)
List pgsql-hackers
> On Thu, Apr 09, 2020 at 09:55:25AM +1200, Thomas Munro wrote:
> Thanks.  Here's a rebase.

Thanks for working on this patch, it seems like a great feature. I'm
probably a bit late to the party, but still want to make couple of
commentaries.

The patch indeed looks good, I couldn't find any significant issues so
far and almost all my questions I had while reading it were actually
answered in this thread. I'm still busy with benchmarking, mostly to see
how prefetching would work with different workload distributions and how
much the kernel will actually prefetch.

In the meantime I have a few questions:

> On Wed, Feb 12, 2020 at 07:52:42PM +1300, Thomas Munro wrote:
> > On Fri, Jan 3, 2020 at 7:10 AM Tomas Vondra
> > <tomas.vondra@2ndquadrant.com> wrote:
> > > Could we instead specify the number of blocks to prefetch? We'd probably
> > > need to track additional details needed to determine number of blocks to
> > > prefetch (essentially LSN for all prefetch requests).
>
> Here is a new WIP version of the patch set that does that.  Changes:
>
> 1.  It now uses effective_io_concurrency to control how many
> concurrent prefetches to allow.  It's possible that we should have a
> different GUC to control "maintenance" users of concurrency I/O as
> discussed elsewhere[1], but I'm staying out of that for now; if we
> agree to do that for VACUUM etc, we can change it easily here.  Note
> that the value is percolated through the ComputeIoConcurrency()
> function which I think we should discuss, but again that's off topic,
> I just want to use the standard infrastructure here.

This totally makes sense, I believe the question "how much to prefetch"
eventually depends equally on a type of workload (correlates with how
far in WAL to read) and how much resources are available for prefetching
(correlates with queue depth). But in the documentation it looks like
maintenance-io-concurrency is just an "unimportant" option, and I'm
almost sure will be overlooked by many readers:

    The maximum distance to look ahead in the WAL during recovery, to find
    blocks to prefetch.  Prefetching blocks that will soon be needed can
    reduce I/O wait times.  The number of concurrent prefetches is limited
    by this setting as well as
    <xref linkend="guc-maintenance-io-concurrency"/>.  Setting it too high
    might be counterproductive, if it means that data falls out of the
    kernel cache before it is needed.  If this value is specified without
    units, it is taken as bytes.  A setting of -1 disables prefetching
    during recovery.

Maybe it makes also sense to emphasize that maintenance-io-concurrency
directly affects resource consumption and it's a "primary control"?

> On Wed, Mar 18, 2020 at 06:18:44PM +1300, Thomas Munro wrote:
>
> Here's a new version that changes that part just a bit more, after a
> brief chat with Andres about his async I/O plans.  It seems clear that
> returning an enum isn't very extensible, so I decided to try making
> PrefetchBufferResult a struct whose contents can be extended in the
> future.  In this patch set it's still just used to distinguish 3 cases
> (hit, miss, no file), but it's now expressed as a buffer and a flag to
> indicate whether I/O was initiated.  You could imagine that the second
> thing might be replaced by a pointer to an async I/O handle you can
> wait on or some other magical thing from the future.

I like the idea of extensible PrefetchBufferResult. Just one commentary,
if I understand correctly the way how it is being used together with
prefetch_queue assumes one IO operation at a time. This limits potential
extension of the underlying code, e.g. one can't implement some sort of
buffering of requests and submitting an iovec to a sycall, then
prefetch_queue will no longer correctly represent inflight IO. Also,
taking into account that "we don't have any awareness of when I/O really
completes", maybe in the future it makes to reconsider having queue in
the prefetcher itself and rather ask for this information from the
underlying code?

> On Wed, Apr 08, 2020 at 04:24:21AM +1200, Thomas Munro wrote:
> > Is there a way we could have a "historical" version of at least some of
> > these? An average queue depth, or such?
>
> Ok, I added simple online averages for distance and queue depth that
> take a sample every time recovery advances by 256kB.

Maybe it was discussed in the past in other threads. But if I understand
correctly, this implementation weights all the samples. Since at the
moment it depends directly on replaying speed (so a lot of IO involved),
couldn't it lead to a single outlier at the beginning skewing this value
and make it less useful? Does it make sense to decay old values?



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Poll: are people okay with function/operator table redesign?
Next
From: Ranier Vilela
Date:
Subject: Re: PG compilation error with Visual Studio 2015/2017/2019