On Wed, Aug 13, 2025 at 5:29 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > 1. The leading block is BM_VALID, so we choose to give it to you
> > immediately and not look further (we could look further and return
> > more than one consecutive BM_VALID block at a time, but this isn't
> > implemented)
>
> I am curious about why this isn't implemented. It looks helpful.
> Is there any blocking issue or trade-off for not doing so?
stream->distance tends to be 1 for well cached data (it grows when IO
is needed and it is worth looking far ahead, and shrinks when data is
found in cache). So we won't currently have a distance > 1 for well
cached data, so we wouldn't benefit much. That may not always be
true: if the buffer mapping table is changed to a tree data structure
(like the way kernels manage virtual memory pages associated with a
file or other VM object), then we might have an incentive to look up
lots of cached blocks at the same time, and then we might want to
change the distance tuning algorithm, and then we might want
StartReadBuffers() to be able to return more than one cached block at
a time. We've focused mainly on I/O so far, and just tried not to
make the well cached cases worse.
> The format of this part is not aligned well in gmail, so I copy it into vs code.
> Is this layout right?
Yes.
> I found second illustration somewhat hard to follow, especially
> the "do nothing" trick and the movement of next_buffef_index in the second queue.
> Maybe I need to read the corresponding code.
Suppose you have pending_read_blocknum = 100, pending_read_nblock = 5,
next_buffer_index = 200. Now you decide to start that read, because
the next block the caller wants can't be combined, so you call
StartReadBuffers(blocknum = 100, *nblocks = 4, buffers =
&buffers[200]). StartReadBuffers() pins 5 buffers and starts an IO,
but finds a reason to split it at size 2. It sets *nblocks to 2, and
returns. Now read_stream.c adds 2 to pending_read_blocknum, subtracts
2 from pending_read_nblocks, so that it represents the continuation of
the previous read. It also adds 2 to next_buffer_index with modulo
arithmetic (so it wraps around the queue), because that is the
location of the buffers for the next read, and sets forwarded_buffers
to 3, because there are 3 buffers already pinned (in that patch ^ it
is renamed to pending_read_npinned). The three buffers themselves are
already in the queue at the right position for StartReadBuffers() to
receive them and know that it doesn't have to pin them again. What I
was referring to with "doing nothing" is the way that if you keep
sliding StartReadBuffers() call along the queue, the extra buffers it
spits out become its input next time, without having to be copied
anywhere. In that picture there are two forwarded buffers, labeled 9
and A.