Re: AIO / read stream heuristics adjustments for index prefetching - Mailing list pgsql-hackers
| From | Nazir Bilal Yavuz |
|---|---|
| Subject | Re: AIO / read stream heuristics adjustments for index prefetching |
| Date | |
| Msg-id | CAN55FZ3GT+o545U9JLR0AC5JtBznqJP4Mf9Mi4k3axqqAXTxPg@mail.gmail.com Whole thread Raw |
| In response to | Re: AIO / read stream heuristics adjustments for index prefetching (Andres Freund <andres@anarazel.de>) |
| Responses |
Re: AIO / read stream heuristics adjustments for index prefetching
|
| List | pgsql-hackers |
Hi,
On Thu, 2 Apr 2026 at 02:20, Andres Freund <andres@anarazel.de> wrote:
>
> I've pushed what was 0001 and 0002. Will push the former 0003 shortly.
0001 LGTM.
0002: read_stream: Prevent distance from decaying too quickly
+ /*
+ * As we needed IO, prevent distance from being reduced within our
+ * maximum look-ahead window. This avoids having distance collapse too
+ * quickly in workloads where most of the required blocks are cached,
+ * but where the remaining IOs are a sufficient enough factor to cause
+ * a substantial slowdown if executed synchronously.
+ *
+ * There are valid arguments for preventing decay for max_ios or for
+ * max_pinned_buffers. But the argument for max_pinned_buffers seems
+ * clearer - if we can't see any misses within the maximum look-ahead
+ * distance, we can't do any useful read-ahead.
+ */
+ stream->distance_decay_holdoff = stream->max_pinned_buffers;
That is already committed but I have a question. Did you think about
setting stream->distance_decay_holdoff to current stream->distance?
This will also fix 'miss followed by a hit' (it won't fix double
missed followed by a hit, though).
0003: aio: io_uring: Trigger async processing for large IOs
There is a small optimization opportunity, we don't need to calculate
io_size for the DIO since pgaio_uring_should_use_async() will always
return false. Do you think it is worth implementing this? Other than
that, LGTM.
0004: read_stream: Only increase distance when waiting for IO
LGTM.
0005: WIP: read_stream: Move logic about IO combining & issuing to helpers
/* never pin more buffers than allowed */
if (stream->pinned_buffers + stream->pending_read_nblocks >=
stream->max_pinned_buffers)
return false;
/*
* Don't start more read-ahead if that'd put us over the distance limit
* for doing read-ahead.
*/
if (stream->pinned_buffers + stream->pending_read_nblocks >=
stream->distance)
return false;
AFAIK, stream->distance can't be higher than
stream->max_pinned_buffers [1], so I think we can remove the first if
block. If we are not sure about [1], stream->max_pinned_buffers most
likely will be higher than stream->distance, it would make sense to
change the order of these.
Aha, I understood this change after looking at 0006. It is nitpick but
'if (stream->pinned_buffers + stream->pending_read_nblocks >=
stream->max_pinned_buffers)' block can be added in 0006. Other than
these, LGTM.
0006: WIP: read stream: Split decision about look ahead for AIO and combining
I liked the idea of being more aggressive to do IO combining. What is
the reason for gradually increasing combine_distance, is it to not do
unnecessary IOs at the start?
+ /*
+ * XXX: Should we actually reduce this at any time other than
+ * a reset? For now we have to, as this is also a condition
+ * for re-enabling fast_path.
+ */
+ if (stream->combine_distance > 1)
+ stream->combine_distance--;
I don't think we need to reduce this other than reset.
+ /*
+ * Allow looking further ahead if we have an the process of building a
+ * larger IO, the IO is not yet big enough and we don't yet have IO in
+ * flight. Note that this is allowed even if we are reaching the
+ * read-ahead limit (but not the buffer pin limit).
+ *
+ * This is important for cases where either effective_io_concurrency is
+ * low or we never need to wait for IO and thus are not increasing the
+ * distance. Without this we would end up with lots of small IOs.
+ */
+ if (stream->pending_read_nblocks > 0 &&
+ stream->pinned_buffers == 0 &&
+ stream->pending_read_nblocks < stream->combine_distance)
+ return true;
Do we need to check 'stream->pinned_buffers == 0' here? I think we can
continue to look ahead although there are pinned buffers as we already
know 'stream->pinned_buffers + stream->pending_read_nblocks <
stream->max_pinned_buffers'.
+ /* same if capped not by io_combine_limit but combine_distance */
+ if (stream->combine_distance > 0 &&
+ pending_read_nblocks >= stream->combine_distance)
+ return true;
I think we don't need to check 'stream->combine_distance > 0', it
shouldn't be less or equal than zero when the
'stream->readahead_distance' is not zero, right?
+ {
+ stream->readahead_distance = Min(max_pinned_buffers,
stream->io_combine_limit);
+ stream->combine_distance = stream->io_combine_limit;
+ }
I think the stream->combine_distance should be equal to
stream->readahead_distance as we don't want to surpass
max_pinned_buffers.
0007: Hacky implementation of making read_stream_reset()/end() not wait for IO
Read stream parts LGTM. I don't have a comment on the FIXME part, though.
--
Regards,
Nazir Bilal Yavuz
Microsoft
pgsql-hackers by date: