Re: Direct I/O - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Direct I/O
Date
Msg-id CA+hUKGL1L3DccbNRCfmwYOx=WO58sMpdZn5x=jY0NUW56dPHuw@mail.gmail.com
Whole thread Raw
In response to Re: Direct I/O  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Direct I/O
Re: Direct I/O
List pgsql-hackers
On Tue, Apr 18, 2023 at 4:06 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Sat, Apr 15, 2023 at 2:19 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> I get the impression that we are going to need an actual runtime
> >> test if we want to defend against this.  Not entirely convinced
> >> it's worth the trouble.  Who, other than our deliberately rear-guard
> >> buildfarm animals, is going to be building modern PG with such old
> >> compilers?  (And more especially to the point, on platforms new
> >> enough to have working O_DIRECT?)
>
> > I don't think that I fully understand everything under discussion
> > here, but I would just like to throw in a vote for trying to make
> > failures as comprehensible as we reasonably can.
>
> I'm not hugely concerned about this yet.  I think the reason for
> slipping this into v16 as developer-only code is exactly that we need
> to get a feeling for where the portability dragons live.  When (and
> if) we try to make O_DIRECT mainstream, yes we'd better be sure that
> any known failure cases are reported well.  But we need the data
> about which those are, first.

+1

A couple more things I wanted to note:

* We have no plans to turn this on by default even when the later
asynchronous machinery is proposed, and direct I/O starts to make more
economic sense (think: your stream of small reads and writes will be
converted to larger preadv/pwritev or moral equivalent and performed
ahead of time in the background).  Reasons: (1) There will always be a
few file systems that refuse O_DIRECT (Linux tmpfs is one such, as we
learned in this thread; if fails with EINVAL at open() time), and (2)
without a page cache, you really need to size your shared_buffers
adequately and we can't do that automatically.  It's something you'd
opt into for a dedicated database server along with other carefully
considered settings.  It seems acceptable to me that if you set
io_direct to a non-default setting on an unusual-for-a-database-server
filesystem you might get errors screaming about inability to open
files -- you'll just have to turn it back off again if it doesn't work
for you.

* For the alignment part, C11 has "alignas(x)" in <stdalign.h>, so I
very much doubt that a hypothetical new Deathstation C compiler would
not know how to align stack objects arbitrarily, even though for now
as a C99 program we have to use the non-standard incantations defined
in our c.h.  I assume we'll eventually switch to that.  In the
meantime, if someone manages to build PostgreSQL on a hypothetical C
compiler that our c.h doesn't recognise, we just won't let you turn
the io_direct GUC on (because we set PG_O_DIRECT to 0 if we don't have
an alignment macro, see commit faeedbce's message for rationale).  If
the alignment trick from c.h appears to be available but is actually
broken (GCC 4.2.1), then those assertions I added into smgrread() et
al will fail as Tom showed (yay! they did their job), or in a
non-assert build you'll probably get EINVAL when you try to read or
write from your badly aligned buffers depending on how picky your OS
is, but that's just an old bug in a defunct compiler that we have by
now written more about they ever did in their bug tracker.

* I guess it's unlikely at this point that POSIX will ever standardise
O_DIRECT if they didn't already in the 90s (I didn't find any
discussion of it in their issue tracker).  There is really only one OS
on our target list that truly can't do direct I/O at all: OpenBSD.  It
seems a reasonable bet that if they or a hypothetical totally new
Unixoid system ever implemented it they'd spell it the same IRIX way
for practical reasons, but if not we just won't use it until someone
writes a patch *shrug*.  There is also one system that's been rocking
direct I/O since the 90s for Oracle etc, but PostgreSQL still doesn't
know how to turn it on: Solaris has a directio() system call.  I
posted a (trivial) patch for that once in the thread where I added
Apple F_NOCACHE, but there is probably nobody on this list who can
test it successfully (as Tom discovered, wrasse's host is not
configured right for it, you'd need an admin/root to help set up a UFS
file system, or perhaps modern (closed) ZFS can do it but that system
is old and unpatched), and I have no desire to commit a "blind" patch
for an untested niche setup; I really only considered it because I
realised I was so close to covering the complete set of OSes.  That's
cool, we just won't let you turn the GUC on if we don't know how and
the error message is clear about that if you try.



pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: Temporary tables versus wraparound... again
Next
From: Tom Lane
Date:
Subject: Re: check_strxfrm_bug()