Re: Streaming I/O, vectored I/O (WIP) - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Streaming I/O, vectored I/O (WIP)
Date
Msg-id CA+hUKGKRb7LHA7Y8O=CxJhBqjKSLWVZDkjyBaM6WLSAM5v4UKg@mail.gmail.com
Whole thread Raw
In response to Re: Streaming I/O, vectored I/O (WIP)  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: Streaming I/O, vectored I/O (WIP)
List pgsql-hackers
On Mon, Mar 25, 2024 at 2:02 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Wed, Mar 20, 2024 at 4:04 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> > >       /*
> > >        * Skip the initial ramp-up phase if the caller says we're going to be
> > >        * reading the whole relation.  This way we start out doing full-sized
> > >        * reads.
> > >        */
> > >       if (flags & PGSR_FLAG_FULL)
> > >               pgsr->distance = Min(MAX_BUFFERS_PER_TRANSFER, pgsr->max_pinned_buffers);
> > >       else
> > >               pgsr->distance = 1;
> >
> > Should this be "Max(MAX_BUFFERS_PER_TRANSFER,
> > pgsr->max_pinned_buffers)"? max_pinned_buffers cannot be smaller than
> > MAX_BUFFERS_PER_TRANSFER though, given how it's initialized earlier. So
> > perhaps just 'pgsr->distance = pgsr->max_pinned_buffers' ?
>
> Right, done.

BTW I forgot to mention that in v10 I changed my mind and debugged my
way back to the original coding, which now looks like this:

    /*
     * Skip the initial ramp-up phase if the caller says we're going to be
     * reading the whole relation.  This way we start out assuming we'll be
     * doing full io_combine_limit sized reads (behavior B).
     */
    if (flags & READ_STREAM_FULL)
        stream->distance = Min(max_pinned_buffers, io_combine_limit);
    else
        stream->distance = 1;

It's not OK for distance to exceed max_pinned_buffers.  But if
max_pinned_buffers is huge, remember that the goal here is to access
'behavior B' meaning wide read calls but no unnecessary extra
look-ahead beyond what is needed for that, so we also don't want to
exceed io_combine_limit.  Therefore we want the minimum of those two
numbers.  In practice on a non-toy system, that's always going to be
io_combine_limit.  But I'm not sure how many users of READ_STREAM_FULL
there will be, and I am starting to wonder if it's a good name for the
flag, or even generally very useful.  It's sort of saying "I expect to
do I/O, and it'll be sequential, and I won't give up until the end".
But how many users can really make those claims?  pg_prewarm is unsual
in that it contains an explicit assumption that the cache is cold and
we want to warm it up.  But maybe we should just let the adaptive
algorithm do its thing.  It only takes a few reads to go from 1 ->
io_combine_limit.

Thinking harder, if we're going to keep this and not just be fully
adaptive, perhaps there should be a flag READ_STREAM_COLD, where you
hint that the data is not expected to be cached, and you'd combine
that with the _SEQUENTIAL hint.  pg_prewarm hints _COLD | _SEQUENTIAL.
Then the initial distance would be something uses the flag
combinations to select initial behavior A, B, C (and we'll quickly
adjust if you're wrong):

    if (!(flags & READ_STREAM_COLD))
        stream->distance = 1;
    else if (flags & READ_STREAM_SEQUENTIAL)
        stream->distance = Min(max_pinned_buffers, io_combine_limit);
    else
        stream->distance = max_pinned_buffers;

But probably almost all users especially in the executor haven't
really got much of a clue what they're going to do so they'd use the
initial starting position of 1 (A) and we'd soo figure it out.  Maybe
overengineering for pg_prewarm is a waste of time and we should just
delete the flag instead and hard code 1.



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Add new error_action COPY ON_ERROR "log"
Next
From: Thomas Munro
Date:
Subject: Re: Streaming I/O, vectored I/O (WIP)