Re: Streaming I/O, vectored I/O (WIP) - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: Streaming I/O, vectored I/O (WIP) |
Date | |
Msg-id | CA+hUKGKkCSb0zLxhLYEyZ4vk4HMqf6GgoKsbWr=qniQVuQ_xUw@mail.gmail.com Whole thread Raw |
In response to | Re: Streaming I/O, vectored I/O (WIP) (Andres Freund <andres@anarazel.de>) |
List | pgsql-hackers |
On Thu, Sep 28, 2023 at 9:13 AM Andres Freund <andres@anarazel.de> wrote: > On 2023-09-27 21:33:15 +0300, Heikki Linnakangas wrote: > > Looking at the later patch that introduces the streaming read API, seems > > that it finishes all the reads, so I suppose we don't need an abort > > function. Does it all get cleaned up correctly on error? > > I think it should. The buffer error handling is one of the areas where I > really would like to have some way of testing the various cases, it's easy to > get things wrong, and basically impossible to write reliable tests for with > our current infrastructure. One thing to highlight is that this patch doesn't create a new state in that case. In master, we already have the concept of a buffer with BM_TAG_VALID but not BM_VALID and not BM_IO_IN_PROGRESS, reachable if there is an I/O error. Eventually another reader will try the I/O again, or the buffer will fall out of the pool. With this patch it's the same, it's just a wider window: more kinds of errors might be thrown in code between Prepare() and Complete() before we even have BM_IO_IN_PROGRESS. So there is nothing extra to clean up. Right? Yeah, it would be nice to test buffer pool logic directly. Perhaps with a C unit test framework[1] and pluggable smgr[2] we could mock up cases like I/O errors... > > > typedef bool (*PgStreamingReadBufferDetermineNextCB) (PgStreamingRead *pgsr, > > > uintptr_t pgsr_private, > > > void *per_io_private, > > > BufferManagerRelation *bmr, > > > ForkNumber *forkNum, > > > BlockNumber *blockNum, > > > ReadBufferMode *mode); > > > > I was surprised that 'bmr', 'forkNum' and 'mode' are given separately on > > each read. I see that you used that in the WAL replay prefetching, so I > > guess that makes sense. > > Yea, that's the origin - I don't like it, but I don't really have a better > idea. Another idea I considered was that streams could be associated with a single relation, but recovery could somehow manage a set of them. From a certain point of view, that makes sense (we could be redoing work that was created by multiple concurrent streams at 'do' time, and with the approach shown here some clustering opportunities available at do time are lost at redo time), but it's not at all clear that it's worth the overheads or complexity, and I couldn't immediately figure out how to do it. But I doubt there would ever be any other users of a single stream with multiple relations, and I agree that this is somehow not quite satisfying... Perhaps we should think about that some more... [1] https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BajSQ_8eu2AogTncOnZ5me2D-Cn66iN_-wZnRjLN%2Bicg%40mail.gmail.com [2] https://www.postgresql.org/message-id/flat/CAEze2WgMySu2suO_TLvFyGY3URa4mAx22WeoEicnK=PCNWEMrA@mail.gmail.com
pgsql-hackers by date: