Re: fdatasync(2) on macOS - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: fdatasync(2) on macOS
Date
Msg-id CA+hUKGJWhELJNTxLYhMmUb5go2jnkoR0AoN2e4pJUfyqxy7jHQ@mail.gmail.com
Whole thread Raw
In response to Re: fdatasync(2) on macOS  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Mon, Jan 18, 2021 at 5:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> (1) other platforms weren't safe-by-default either.  Perhaps the
> state of the art is better now, though?

Generally the answer seems to be yes, but there are still some systems
out there that don't send flushes when volatile write cache is
enabled.  Probably still including Macs, by the admission of their man
page.  The numbers I saw would put a little M1 Air at the upper range
of super expensive server storage if they included or didn't need a
flush to survive power loss, but then that's a consumer device with a
battery so it doesn't really fit into the usual way we think about
database server storage and power loss...

> (2) we don't want to force exceedingly-expensive defaults on people
> who may be uninterested in reliable storage.  That seemed like a
> shaky argument then and it still does now.  Still, I see the point
> that suddenly degrading performance by orders of magnitude would
> be a PR disaster.

(Purely as a matter of curiosity, I wonder why the latency is so high
for F_FULLFSYNC.  Wild speculation: APFS is said to be a bit like ZFS,
but it's also said to avoid the data journaling of HFS+... so perhaps
it lacks an equivalent of ZFS's ZIL (a thing like WAL) that allows
synchronous writes to avoid having to flush out a new tree and uber
block (in ZFS lingo "spa_sync()").  It might be possible to see this
with tools like iosnoop (or the underlying io:::start dtrace probe),
if you overwrite a single block and then fcntl(F_FULLFSYNC).  Your 12
ops/sec on spinning rust would have to be explained by something like
that, and is significantly slower than the speeds I see on my spinning
rust ZFS system that manages something like disk rotation speed.)

Anyway, my purpose in this thread was to flag our usage of the
undocumented system call and open flags; that is, "how we talk to the
OS", not "how the OS talks to the disk".  That turned out to be
already well known and not as new as I first thought, so I'm not
planning to pursue this Mac stuff any further, despite my curiosity...



pgsql-hackers by date:

Previous
From: Neil Chen
Date:
Subject: Re: Phrase search vs. multi-lexeme tokens
Next
From: Zhihong Yu
Date:
Subject: Re: Tid scan improvements