From 4fb43ea17f6f34e9eb3ec7d5f740eb9ceb2c231e Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Tue, 3 Mar 2026 18:00:53 -0500 Subject: [PATCH v4 04/14] read_stream: Only increase distance when waiting for IO This avoids increasing the distance to the maximum in cases where the IO subsystem is already keeping up. This turns out to be important for performance for two reasons: - Pinning a lot of buffers is not cheap. If additional pins allow us to avoid IO waits, it's definitely worth it, but if we can already do all the necessary readahead at a distance of 16, reading ahead 512 buffers can increase the CPU overhead substantially. This is particularly noticeable when the to-be-read blocks are already in the kernel page cache. - If the read stream is read to completion, reading in data earlier than needed is of limited consequences, leaving aside the CPU costs mentioned above. But if the read stream will not be fully consumed, e.g. because it is on the inner side of a nested loop join, the additional IO can be a serious performance issue. This is not that commonly a problem for current read stream users, but the upcoming work, to use a read stream to fetch table pages as part of an index scan, frequently encounters this. Note that this commit would have substantial performance downsides without earlier commits. In particular the earlier commit to avoid decreasing the readahead distance when there was recent IO is crucial, as otherwise we very often would end up not reading ahead aggressively enough anymore with this commit, due to increasing the distance less often. Author: Reviewed-by: Discussion: https://postgr.es/m/ Backpatch: --- src/backend/storage/aio/read_stream.c | 39 +++++++++++++++++++++++---- 1 file changed, 34 insertions(+), 5 deletions(-) diff --git a/src/backend/storage/aio/read_stream.c b/src/backend/storage/aio/read_stream.c index 31f9e35dee3..4b7db4ab372 100644 --- a/src/backend/storage/aio/read_stream.c +++ b/src/backend/storage/aio/read_stream.c @@ -952,22 +952,51 @@ read_stream_next_buffer(ReadStream *stream, void **per_buffer_data) { int16 io_index = stream->oldest_io_index; int32 distance; /* wider temporary value, clamped below */ + bool needed_wait; /* Sanity check that we still agree on the buffers. */ Assert(stream->ios[io_index].op.buffers == &stream->buffers[oldest_buffer_index]); - WaitReadBuffers(&stream->ios[io_index].op); + needed_wait = WaitReadBuffers(&stream->ios[io_index].op); Assert(stream->ios_in_progress > 0); stream->ios_in_progress--; if (++stream->oldest_io_index == stream->max_ios) stream->oldest_io_index = 0; - /* Look-ahead distance ramps up rapidly after we do I/O. */ - distance = stream->distance * 2; - distance = Min(distance, stream->max_pinned_buffers); - stream->distance = distance; + /* + * If the IO was executed synchronously, we will never see + * WaitReadBuffers() block. This is particularly crucial when + * effective_io_concurrency=0 is used, as all IO will be + * synchronous. Without treating synchronous IO as having waited, we'd + * never allow the distance to get large enough to allow for IO + * combining, resulting in bad performance. + */ + if (stream->ios[io_index].op.flags & READ_BUFFERS_SYNCHRONOUSLY) + needed_wait = true; + + /* + * Have the look-ahead distance ramp up rapidly after we needed to + * wait for IO. We only increase the distance when we needed to wait, + * to avoid increasing the distance further than necessary, as looking + * ahead too far can be costly, both due to the cost of unnecessarily + * pinning many buffers and due to doing IOs that may never be + * consumed if the stream is ended/reset before completion. + * + * If we did not need to wait, the current distance was evidently + * sufficient. + * + * NB: May not increase the distance if we already reached the end of + * the stream, as stream->distance == 0 is used to keep track of + * having reached the end. + */ + if (stream->distance > 0 && needed_wait) + { + distance = stream->distance * 2; + distance = Min(distance, stream->max_pinned_buffers); + stream->distance = distance; + } /* * As we needed IO, prevent distance from being reduced within our -- 2.53.0.1.gb2826b52eb