Re: Streaming I/O, vectored I/O (WIP) - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Streaming I/O, vectored I/O (WIP)
Date
Msg-id CA+hUKGJywvC5J+rcLs3+whK8YPBnKXVLn5o0JFJbNeb4CEdSFA@mail.gmail.com
Whole thread Raw
In response to Re: Streaming I/O, vectored I/O (WIP)  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: Streaming I/O, vectored I/O (WIP)
List pgsql-hackers
On Wed, Nov 29, 2023 at 1:44 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> LGTM. I think this 0001 patch is ready for commit, independently of the
> rest of the patches.

Done.

> In v2-0002-Provide-vectored-variants-of-FileRead-and-FileWri-1.patch, fd.h:
>
> > +/* Filename components */
> > +#define PG_TEMP_FILES_DIR "pgsql_tmp"
> > +#define PG_TEMP_FILE_PREFIX "pgsql_tmp"
> > +
>
> These seem out of place, we already have them in common/file_utils.h.

Yeah, they moved from there in f39b2658 and I messed up the rebase.  Fixed.

> Other than that,
> v2-0002-Provide-vectored-variants-of-FileRead-and-FileWri-1.patch and
> v2-0003-Provide-vectored-variants-of-smgrread-and-smgrwri.patch look
> good to me.

One thing I wasn't 100% happy with was the treatment of ENOSPC.  A few
callers believe that short writes set errno: they error out with a
message including %m.  We have historically set errno = ENOSPC inside
FileWrite() if the write size was unexpectedly small AND the kernel
didn't set errno to a non-zero value (having set it to zero ourselves
earlier).  In FileWriteV(), I didn't want to do that because it is
expensive to compute the total write size from the vector array and we
managed to measure an effect due to that in some workloads.

Note that the smgr patch actually handles short writes by continuing,
instead of raising an error.  Short writes do already occur in the
wild on various systems for various rare technical reasons other than
ENOSPC I have heard (imagine transient failure to acquire some
temporary memory that the kernel chooses not to wait for, stuff like
that, though certainly many people and programs believe they should
not happen[1]), and it seems like a good idea to actually handle them
as our write sizes increase and the probability of short writes might
presumably increase.

With the previous version of the patch, we'd have to change a couple
of other callers not to believe that short writes are errors and set
errno (callers are inconsistent on this point).  I don't really love
that we have "fake" system errors but I also want to stay focused
here, so in this new version V3 I tried a new approach: I realised I
can just always set errno without needing the total size, so that
(undocumented) aspect of the interface doesn't change.  The point
being that it doesn't matter if you clobber errno with a bogus value
when the write was non-short.  Thoughts?

[1] https://utcc.utoronto.ca/~cks/space/blog/unix/WritesNotShortOften

Attachment

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Fix some memory leaks in ecpg.addons
Next
From: Andres Freund
Date:
Subject: Re: remaining sql/json patches