Re: AIO / read stream heuristics adjustments for index prefetching - Mailing list pgsql-hackers

From Nazir Bilal Yavuz
Subject Re: AIO / read stream heuristics adjustments for index prefetching
Date
Msg-id CAN55FZ3GT+o545U9JLR0AC5JtBznqJP4Mf9Mi4k3axqqAXTxPg@mail.gmail.com
Whole thread Raw
In response to Re: AIO / read stream heuristics adjustments for index prefetching  (Andres Freund <andres@anarazel.de>)
Responses Re: AIO / read stream heuristics adjustments for index prefetching
List pgsql-hackers
Hi,

On Thu, 2 Apr 2026 at 02:20, Andres Freund <andres@anarazel.de> wrote:
>
> I've pushed what was 0001 and 0002.  Will push the former 0003 shortly.

0001 LGTM.


0002:  read_stream: Prevent distance from decaying too quickly

+        /*
+         * As we needed IO, prevent distance from being reduced within our
+         * maximum look-ahead window. This avoids having distance collapse too
+         * quickly in workloads where most of the required blocks are cached,
+         * but where the remaining IOs are a sufficient enough factor to cause
+         * a substantial slowdown if executed synchronously.
+         *
+         * There are valid arguments for preventing decay for max_ios or for
+         * max_pinned_buffers.  But the argument for max_pinned_buffers seems
+         * clearer - if we can't see any misses within the maximum look-ahead
+         * distance, we can't do any useful read-ahead.
+         */
+        stream->distance_decay_holdoff = stream->max_pinned_buffers;

That is already committed but I have a question. Did you think about
setting stream->distance_decay_holdoff to current stream->distance?
This will also fix 'miss followed by a hit' (it won't fix double
missed followed by a hit, though).


0003:  aio: io_uring: Trigger async processing for large IOs

There is a small optimization opportunity, we don't need to calculate
io_size for the DIO since pgaio_uring_should_use_async() will always
return false. Do you think it is worth implementing this? Other than
that, LGTM.


0004:  read_stream: Only increase distance when waiting for IO

LGTM.


0005:  WIP: read_stream: Move logic about IO combining & issuing to helpers

    /* never pin more buffers than allowed */
    if (stream->pinned_buffers + stream->pending_read_nblocks >=
stream->max_pinned_buffers)
        return false;

    /*
     * Don't start more read-ahead if that'd put us over the distance limit
     * for doing read-ahead.
     */
    if (stream->pinned_buffers + stream->pending_read_nblocks >=
stream->distance)
        return false;

AFAIK, stream->distance can't be higher than
stream->max_pinned_buffers [1], so I think we can remove the first if
block. If we are not sure about [1], stream->max_pinned_buffers most
likely will be higher than stream->distance, it would make sense to
change the order of these.

Aha, I understood this change after looking at 0006. It is nitpick but
'if (stream->pinned_buffers + stream->pending_read_nblocks >=
stream->max_pinned_buffers)' block can be added in 0006. Other than
these, LGTM.


0006:  WIP: read stream: Split decision about look ahead for AIO and combining

I liked the idea of being more aggressive to do IO combining. What is
the reason for gradually increasing combine_distance, is it to not do
unnecessary IOs at the start?

+                /*
+                 * XXX: Should we actually reduce this at any time other than
+                 * a reset? For now we have to, as this is also a condition
+                 * for re-enabling fast_path.
+                 */
+                if (stream->combine_distance > 1)
+                    stream->combine_distance--;

I don't think we need to reduce this other than reset.

+    /*
+     * Allow looking further ahead if we have an the process of building a
+     * larger IO, the IO is not yet big enough and we don't yet have IO in
+     * flight.  Note that this is allowed even if we are reaching the
+     * read-ahead limit (but not the buffer pin limit).
+     *
+     * This is important for cases where either effective_io_concurrency is
+     * low or we never need to wait for IO and thus are not increasing the
+     * distance. Without this we would end up with lots of small IOs.
+     */
+    if (stream->pending_read_nblocks > 0 &&
+        stream->pinned_buffers == 0 &&
+        stream->pending_read_nblocks < stream->combine_distance)
+        return true;

Do we need to check 'stream->pinned_buffers == 0' here? I think we can
continue to look ahead although there are pinned buffers as we already
know 'stream->pinned_buffers + stream->pending_read_nblocks <
stream->max_pinned_buffers'.

+    /* same if capped not by io_combine_limit but combine_distance */
+    if (stream->combine_distance > 0 &&
+        pending_read_nblocks >= stream->combine_distance)
+        return true;

I think we don't need to check 'stream->combine_distance > 0', it
shouldn't be less or equal than zero when the
'stream->readahead_distance' is not zero, right?

+    {
+        stream->readahead_distance = Min(max_pinned_buffers,
stream->io_combine_limit);
+        stream->combine_distance = stream->io_combine_limit;
+    }

I think the stream->combine_distance should be equal to
stream->readahead_distance as we don't want to surpass
max_pinned_buffers.

0007:  Hacky implementation of making read_stream_reset()/end() not wait for IO

Read stream parts LGTM. I don't have a comment on the FIXME part, though.


-- 
Regards,
Nazir Bilal Yavuz
Microsoft



pgsql-hackers by date:

Previous
From: Devrim Gündüz
Date:
Subject: Re: LLVM 22
Next
From: Greg Sabino Mullane
Date:
Subject: Re: [PATCH] Add prepared_orphaned_transaction_timeout GUC