Re: BUG #18009: Postgres Recovery not happening - Mailing list pgsql-bugs

From Thomas Munro
Subject Re: BUG #18009: Postgres Recovery not happening
Date
Msg-id CA+hUKG+omp-D2p11rG+_h3eAWG3sO1TJvxcoh=_8reXDAM_O0g@mail.gmail.com
Whole thread Raw
In response to Re: BUG #18009: Postgres Recovery not happening  (Vamshikrishna T <tvk1271@gmail.com>)
List pgsql-bugs
On Mon, Jul 10, 2023 at 11:27 PM Vamshikrishna T <tvk1271@gmail.com> wrote:
> Other thing i am interested in is, Looks like Postgres 15.2 doesn't support Direct or Concurrent I/O on AIX ?. I
don'tsee any 
> setting where i can tune that ?.  I feel DIO or CIO would be more safer than cached I/O. I see fsync error behaviors
are
> pretty interesting across different OS. Thanks for the info.

There is nothing in v15.  In v16 (in beta now), we have just the
tentative beginnings of direct I/O support.  So far we just provided
the switch to turn it on for WAL and/or data files, which involved
getting various memory buffers to be correctly aligned and figuring
out various system programming details for ~10 OSes.  For now the
setting is called "debug_io_direct", with a temporary "debug_" prefix
because we don't want people turning it on for real production
databases yet.  But that's the easy bit.  You can't use it naively
without taking a massive performance hit.  We also need to replace the
readahead/writeback and automatic clustering that PostgreSQL relies on
today, which is probably why previous proposals over the past decades
to just add a simple direct I/O switch went nowhere -- that's not even
the real problem!  That just gives you a lot of painfully slow
unclustered 8KB reads and write calls.  We have a prototype and
forthcoming proposal in the pipeline to address the real problem, but
we wanted the debug_io_direct to be released sooner to give us more
time to learn about weird problems on various OSes and filesystems.
We've already learned a few surprising things in the thread that added
the setting[1] (like how to make btrfs corrupt its own checksums).

I remember from a previous life that different proprietary Unixes
seemed to take different views on inode-level serialisation with
O_DIRECT.  (Maybe something to do with interpreting POSIX's
requirements differently?)  AIX and HP-UX seemed to take the view that
you should have to opt in to concurrency separately, unlike IRIX/XFS.
AIX seems to be the last OS standing that might want us to do that.
Hence my question on https://wiki.postgresql.org/wiki/AIX which I left
there for someone who cares about PostgreSQL-on-AIX to research, along
with the other patches there (patches that will *definitely* improve
performance, I should add, I just can't test them/take them forward
myself).  Maybe later though, once we have more async machinery.
(People joke about these old dinosaur systems but they were doing
direct concurrent writes in the 90s or even late 80s while ext4 has
literally just learned how to do that in 2023 and can't even make a
file bigger than 16TB.  But I digress.).

As for what comes next, here's what we have prototyped:  My colleague
Andres Freund designed an architecture for driving asynchronous I/O
ahead of time, initially with Linux's io_uring.  I worked on extending
it to use pluggable backends to support the obvious options available
on the ~10 OSes we target, including a fully portable fallback.  For
all OSes, we have the option to use a pool of IO worker processes
doing plain old synchronous preadv()/pwritev()/fsync()/... calls in
the background, but since we're talking about AIX, I can report that
AIX turned out to be one of the few OSes where the cursed POSIX AIO
API actually worked as expected and well enough for our purposes[2],
and from light reading, it seems like it might be truly asynchronous
down to the drivers in some cases.  I wrote a little talk about that
portability work at an OS conference, primarily for therapeutic value,
because asynchronous I/O truly is a portability nightmare.

[1] https://www.postgresql.org/message-id/flat/CA%2BhUKGK1X532hYqJ_MzFWt0n1zt8trz980D79WbjwnT-yYLZpg%40mail.gmail.com
[2] https://speakerdeck.com/macdice/aio-and-dio-for-postgresql-on-freebsd?slide=20



pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #18018: Homebrew link is broken
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: BUG #17928: Standby fails to decode WAL on termination of primary