Re: AIO / read stream heuristics adjustments for index prefetching - Mailing list pgsql-hackers

From Melanie Plageman
Subject Re: AIO / read stream heuristics adjustments for index prefetching
Date
Msg-id CAAKRu_YvSjYvKtsTsRqm0=Jy0QhpUTsccRd66BsSKrdN-7ZA1g@mail.gmail.com
Whole thread
In response to AIO / read stream heuristics adjustments for index prefetching  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Tue, Mar 31, 2026 at 12:02 PM Andres Freund <andres@anarazel.de> wrote:
>
> 0001+0002:  Return whether WaitReadBuffers() needed to wait
>
>     The first patch allows pgaio_wref_check_done() to work more reliably with
>     io_uring. Until now it only was able to return true if userspace already
>     had consumed the kernel's completion event, but returned false otherwise.
>     That's not really incorrect, just suboptimal.
>
>     The second patch returns whether WaitReadBuffers() needed to wait for
>     IO. This is useful for a) instrumentation like in [2] and b) to provide
>     information to the read_stream heuristics to control how aggressive to
>     perform read ahead.

These both look good to me except in 0001 you left an XXX in
pgaio_wref_check_done() that I think is the very thing that commit
does.

> 0003:  read_stream: Issue IO synchronously while in fast path

LGTM

> 0004:  read_stream: Prevent distance from decaying too quickly
>
>     There are two minor questions here:
>     - Should read_stream_pause()/read_stream_resume() restore the "holdoff"
>       counter?  I doubt it matters for the prospective user, since it will
>       only be used when the lookahead distance is very large.

I don't really understand this. We have to do this with distance
because we set it to 0 and use distance == 0 to indicate stream end.
read_stream_pause() doesn't set the distance_decay_holoff to 0. If you
mean, should we reset holdoff to its initial value, then I don't think
so. I imagine that users doing a lot of pause and resume may not have
high distance.

>     - For how long to hold off distance reductions?  Initially I was torn
>       between using "max_pinned_buffers" (Min(max_ios * io_combine_limit,
>       cap)) and "max_ios" ([maintenance_]effective_io_concurrency). But I
>       think the former makes more sense, as we otherwise won't allow for far
>       enough readahead when doing IO combining, and it does seem to make sense
>       to hold off decay for long enough that the maximum lookahead could not
>       theoretically allow us to start an IO.

I agree. 0004 LGTM otherwise.

- Melanie



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Initial COPY of Logical Replication is too slow
Next
From: Masahiko Sawada
Date:
Subject: Re: Initial COPY of Logical Replication is too slow