Re: Direct I/O - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: Direct I/O |
Date | |
Msg-id | CA+hUKGL1L3DccbNRCfmwYOx=WO58sMpdZn5x=jY0NUW56dPHuw@mail.gmail.com Whole thread Raw |
In response to | Re: Direct I/O (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Direct I/O
Re: Direct I/O |
List | pgsql-hackers |
On Tue, Apr 18, 2023 at 4:06 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > On Sat, Apr 15, 2023 at 2:19 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> I get the impression that we are going to need an actual runtime > >> test if we want to defend against this. Not entirely convinced > >> it's worth the trouble. Who, other than our deliberately rear-guard > >> buildfarm animals, is going to be building modern PG with such old > >> compilers? (And more especially to the point, on platforms new > >> enough to have working O_DIRECT?) > > > I don't think that I fully understand everything under discussion > > here, but I would just like to throw in a vote for trying to make > > failures as comprehensible as we reasonably can. > > I'm not hugely concerned about this yet. I think the reason for > slipping this into v16 as developer-only code is exactly that we need > to get a feeling for where the portability dragons live. When (and > if) we try to make O_DIRECT mainstream, yes we'd better be sure that > any known failure cases are reported well. But we need the data > about which those are, first. +1 A couple more things I wanted to note: * We have no plans to turn this on by default even when the later asynchronous machinery is proposed, and direct I/O starts to make more economic sense (think: your stream of small reads and writes will be converted to larger preadv/pwritev or moral equivalent and performed ahead of time in the background). Reasons: (1) There will always be a few file systems that refuse O_DIRECT (Linux tmpfs is one such, as we learned in this thread; if fails with EINVAL at open() time), and (2) without a page cache, you really need to size your shared_buffers adequately and we can't do that automatically. It's something you'd opt into for a dedicated database server along with other carefully considered settings. It seems acceptable to me that if you set io_direct to a non-default setting on an unusual-for-a-database-server filesystem you might get errors screaming about inability to open files -- you'll just have to turn it back off again if it doesn't work for you. * For the alignment part, C11 has "alignas(x)" in <stdalign.h>, so I very much doubt that a hypothetical new Deathstation C compiler would not know how to align stack objects arbitrarily, even though for now as a C99 program we have to use the non-standard incantations defined in our c.h. I assume we'll eventually switch to that. In the meantime, if someone manages to build PostgreSQL on a hypothetical C compiler that our c.h doesn't recognise, we just won't let you turn the io_direct GUC on (because we set PG_O_DIRECT to 0 if we don't have an alignment macro, see commit faeedbce's message for rationale). If the alignment trick from c.h appears to be available but is actually broken (GCC 4.2.1), then those assertions I added into smgrread() et al will fail as Tom showed (yay! they did their job), or in a non-assert build you'll probably get EINVAL when you try to read or write from your badly aligned buffers depending on how picky your OS is, but that's just an old bug in a defunct compiler that we have by now written more about they ever did in their bug tracker. * I guess it's unlikely at this point that POSIX will ever standardise O_DIRECT if they didn't already in the 90s (I didn't find any discussion of it in their issue tracker). There is really only one OS on our target list that truly can't do direct I/O at all: OpenBSD. It seems a reasonable bet that if they or a hypothetical totally new Unixoid system ever implemented it they'd spell it the same IRIX way for practical reasons, but if not we just won't use it until someone writes a patch *shrug*. There is also one system that's been rocking direct I/O since the 90s for Oracle etc, but PostgreSQL still doesn't know how to turn it on: Solaris has a directio() system call. I posted a (trivial) patch for that once in the thread where I added Apple F_NOCACHE, but there is probably nobody on this list who can test it successfully (as Tom discovered, wrasse's host is not configured right for it, you'd need an admin/root to help set up a UFS file system, or perhaps modern (closed) ZFS can do it but that system is old and unpatched), and I have no desire to commit a "blind" patch for an untested niche setup; I really only considered it because I realised I was so close to covering the complete set of OSes. That's cool, we just won't let you turn the GUC on if we don't know how and the error message is clear about that if you try.
pgsql-hackers by date: