Re: Streaming I/O, vectored I/O (WIP) - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Streaming I/O, vectored I/O (WIP)
Date
Msg-id CA+hUKGKkCSb0zLxhLYEyZ4vk4HMqf6GgoKsbWr=qniQVuQ_xUw@mail.gmail.com
Whole thread Raw
In response to Re: Streaming I/O, vectored I/O (WIP)  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Thu, Sep 28, 2023 at 9:13 AM Andres Freund <andres@anarazel.de> wrote:
> On 2023-09-27 21:33:15 +0300, Heikki Linnakangas wrote:
> > Looking at the later patch that introduces the streaming read API, seems
> > that it finishes all the reads, so I suppose we don't need an abort
> > function. Does it all get cleaned up correctly on error?
>
> I think it should.  The buffer error handling is one of the areas where I
> really would like to have some way of testing the various cases, it's easy to
> get things wrong, and basically impossible to write reliable tests for with
> our current infrastructure.

One thing to highlight is that this patch doesn't create a new state
in that case.  In master, we already have the concept of a buffer with
BM_TAG_VALID but not BM_VALID and not BM_IO_IN_PROGRESS, reachable if
there is an I/O error.  Eventually another reader will try the I/O
again, or the buffer will fall out of the pool.  With this patch it's
the same, it's just a wider window: more kinds of errors might be
thrown in code between Prepare() and Complete() before we even have
BM_IO_IN_PROGRESS.  So there is nothing extra to clean up.  Right?

Yeah, it would be nice to test buffer pool logic directly.  Perhaps
with a C unit test framework[1] and pluggable smgr[2] we could mock up
cases like I/O errors...

> > > typedef bool (*PgStreamingReadBufferDetermineNextCB) (PgStreamingRead *pgsr,
> > >         uintptr_t pgsr_private,
> > >         void *per_io_private,
> > >         BufferManagerRelation *bmr,
> > >         ForkNumber *forkNum,
> > >         BlockNumber *blockNum,
> > >         ReadBufferMode *mode);
> >
> > I was surprised that 'bmr', 'forkNum' and 'mode' are given separately on
> > each read. I see that you used that in the WAL replay prefetching, so I
> > guess that makes sense.
>
> Yea, that's the origin - I don't like it, but I don't really have a better
> idea.

Another idea I considered was that streams could be associated with a
single relation, but recovery could somehow manage a set of them.
From a certain point of view, that makes sense (we could be redoing
work that was created by multiple concurrent streams at 'do' time, and
with the approach shown here some clustering opportunities available
at do time are lost at redo time), but it's not at all clear that it's
worth the overheads or complexity, and I couldn't immediately figure
out how to do it.  But I doubt there would ever be any other users of
a single stream with multiple relations, and I agree that this is
somehow not quite satisfying...  Perhaps we should think about that
some more...

[1]
https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BajSQ_8eu2AogTncOnZ5me2D-Cn66iN_-wZnRjLN%2Bicg%40mail.gmail.com
[2] https://www.postgresql.org/message-id/flat/CAEze2WgMySu2suO_TLvFyGY3URa4mAx22WeoEicnK=PCNWEMrA@mail.gmail.com



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Eager page freeze criteria clarification
Next
From: Peter Geoghegan
Date:
Subject: Re: Eager page freeze criteria clarification