Re: Streaming I/O, vectored I/O (WIP) - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: Streaming I/O, vectored I/O (WIP) |
Date | |
Msg-id | CA+hUKGKRb7LHA7Y8O=CxJhBqjKSLWVZDkjyBaM6WLSAM5v4UKg@mail.gmail.com Whole thread Raw |
In response to | Re: Streaming I/O, vectored I/O (WIP) (Thomas Munro <thomas.munro@gmail.com>) |
Responses |
Re: Streaming I/O, vectored I/O (WIP)
|
List | pgsql-hackers |
On Mon, Mar 25, 2024 at 2:02 AM Thomas Munro <thomas.munro@gmail.com> wrote: > On Wed, Mar 20, 2024 at 4:04 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > > /* > > > * Skip the initial ramp-up phase if the caller says we're going to be > > > * reading the whole relation. This way we start out doing full-sized > > > * reads. > > > */ > > > if (flags & PGSR_FLAG_FULL) > > > pgsr->distance = Min(MAX_BUFFERS_PER_TRANSFER, pgsr->max_pinned_buffers); > > > else > > > pgsr->distance = 1; > > > > Should this be "Max(MAX_BUFFERS_PER_TRANSFER, > > pgsr->max_pinned_buffers)"? max_pinned_buffers cannot be smaller than > > MAX_BUFFERS_PER_TRANSFER though, given how it's initialized earlier. So > > perhaps just 'pgsr->distance = pgsr->max_pinned_buffers' ? > > Right, done. BTW I forgot to mention that in v10 I changed my mind and debugged my way back to the original coding, which now looks like this: /* * Skip the initial ramp-up phase if the caller says we're going to be * reading the whole relation. This way we start out assuming we'll be * doing full io_combine_limit sized reads (behavior B). */ if (flags & READ_STREAM_FULL) stream->distance = Min(max_pinned_buffers, io_combine_limit); else stream->distance = 1; It's not OK for distance to exceed max_pinned_buffers. But if max_pinned_buffers is huge, remember that the goal here is to access 'behavior B' meaning wide read calls but no unnecessary extra look-ahead beyond what is needed for that, so we also don't want to exceed io_combine_limit. Therefore we want the minimum of those two numbers. In practice on a non-toy system, that's always going to be io_combine_limit. But I'm not sure how many users of READ_STREAM_FULL there will be, and I am starting to wonder if it's a good name for the flag, or even generally very useful. It's sort of saying "I expect to do I/O, and it'll be sequential, and I won't give up until the end". But how many users can really make those claims? pg_prewarm is unsual in that it contains an explicit assumption that the cache is cold and we want to warm it up. But maybe we should just let the adaptive algorithm do its thing. It only takes a few reads to go from 1 -> io_combine_limit. Thinking harder, if we're going to keep this and not just be fully adaptive, perhaps there should be a flag READ_STREAM_COLD, where you hint that the data is not expected to be cached, and you'd combine that with the _SEQUENTIAL hint. pg_prewarm hints _COLD | _SEQUENTIAL. Then the initial distance would be something uses the flag combinations to select initial behavior A, B, C (and we'll quickly adjust if you're wrong): if (!(flags & READ_STREAM_COLD)) stream->distance = 1; else if (flags & READ_STREAM_SEQUENTIAL) stream->distance = Min(max_pinned_buffers, io_combine_limit); else stream->distance = max_pinned_buffers; But probably almost all users especially in the executor haven't really got much of a clue what they're going to do so they'd use the initial starting position of 1 (A) and we'd soo figure it out. Maybe overengineering for pg_prewarm is a waste of time and we should just delete the flag instead and hard code 1.
pgsql-hackers by date: