Thread: Windows now has fdatasync()

Windows now has fdatasync()

From
Thomas Munro
Date:
Hi,

While porting some new IO code to lots of OSes I noticed in passing
that there is now a way to do synchronous fdatasync() on Windows.
This mechanism doesn't have an async variant, which is what I was
actually looking for (which turns out to doable with bleeding edge
IoRings, more on that later), but I figured this might be useful
anyway.  I see that at least one other open source database has
discovered it and seen speedups.  Like some other file API
improvements discussed recently, it's Windows 10+ and NTFS only.  I
tried out a quick POC patch and it runs a bit faster than fsync(), as
expected.  I'm not sure if it's worth bothering with or not given the
other options, but figured it was worth sharing.

While testing that I also couldn't resist adding an extra output line
to pg_test_fsync to run open_datasync in buffered I/O mode, like
PostgreSQL actually does in real life.  I guess I should really change
it to duplicate less code, though...

[1] https://www.postgresql.org/message-id/flat/1527846213.2475.31.camel%40cybertec.at

Attachment

Re: Windows now has fdatasync()

From
Sascha Kuhl
Date:
Great. File sync is a Nice extension for me, as I don't know all file structures.

Thomas Munro <thomas.munro@gmail.com> schrieb am So., 12. Dez. 2021, 03:48:
Hi,

While porting some new IO code to lots of OSes I noticed in passing
that there is now a way to do synchronous fdatasync() on Windows.
This mechanism doesn't have an async variant, which is what I was
actually looking for (which turns out to doable with bleeding edge
IoRings, more on that later), but I figured this might be useful
anyway.  I see that at least one other open source database has
discovered it and seen speedups.  Like some other file API
improvements discussed recently, it's Windows 10+ and NTFS only.  I
tried out a quick POC patch and it runs a bit faster than fsync(), as
expected.  I'm not sure if it's worth bothering with or not given the
other options, but figured it was worth sharing.

While testing that I also couldn't resist adding an extra output line
to pg_test_fsync to run open_datasync in buffered I/O mode, like
PostgreSQL actually does in real life.  I guess I should really change
it to duplicate less code, though...

[1] https://www.postgresql.org/message-id/flat/1527846213.2475.31.camel%40cybertec.at

Re: Windows now has fdatasync()

From
Thomas Munro
Date:
On Sun, Dec 12, 2021 at 3:48 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> [...] I
> tried out a quick POC patch and it runs a bit faster than fsync(), as
> expected.  I'm not sure if it's worth bothering with or not given the
> other options, but figured it was worth sharing.

One reason to consider developing this further is the problem
discussed in the aptly named thread "Loaded footgun open_datasync on
Windows"[1] (not the problem that was fixed in pg_test_fsync, but the
problem with cache control, or lack thereof).  I saw 10x more TPS with
open_datasync than with this experimental fdatasync on my little test
VM, which was more than a little fishy, so I turned off the device
write caching in "Device Manager" and got about the same number from
open_datasync and fdatasync.  Clearly you can lose committed
transactions on power loss[2] with the default OS settings and default
PostgreSQL settings, though perhaps only if you're on SATA storage due
to lack of FUA pass-through[3] (?).  I didn't try an NVMe stack.

That suggests that fdatasync would actually be a better default ...
except for the problems already mentioned with versions and not
working on non-NTFS (not sure how it fails on non-NTFS, though, maybe
it does a full flush, [4] doesn't say).

(The patch is a little rough; I couldn't figure out the headers to get
that macro.  Insert my usual disclaimer that I'm not a Windows guy,
this is stuff I'm just figuring out, all clues welcome...)

[1] https://www.postgresql.org/message-id/flat/1527846213.2475.31.camel%40cybertec.at
[2] https://github.com/MicrosoftDocs/feedback/issues/2747
[3] https://devblogs.microsoft.com/oldnewthing/20170510-00/?p=95505
[4] https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/nf-ntifs-ntflushbuffersfileex



Re: Windows now has fdatasync()

From
Thomas Munro
Date:
I added a commitfest entry for this to try to attract Windows-hacker
reviews.  I wondered about adjusting it to run on older systems, but I
think we're about ready to drop support for Windows < 10 anyway, so
maybe I'll go and propose that separately, instead.

https://commitfest.postgresql.org/37/3530/



Re: Windows now has fdatasync()

From
Robert Haas
Date:
On Sun, Dec 12, 2021 at 4:32 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> One reason to consider developing this further is the problem
> discussed in the aptly named thread "Loaded footgun open_datasync on
> Windows"[1] (not the problem that was fixed in pg_test_fsync, but the
> problem with cache control, or lack thereof).  I saw 10x more TPS with
> open_datasync than with this experimental fdatasync on my little test
> VM, which was more than a little fishy, so I turned off the device
> write caching in "Device Manager" and got about the same number from
> open_datasync and fdatasync.  Clearly you can lose committed
> transactions on power loss[2] with the default OS settings and default
> PostgreSQL settings, though perhaps only if you're on SATA storage due
> to lack of FUA pass-through[3] (?).  I didn't try an NVMe stack.
>
> That suggests that fdatasync would actually be a better default ...
> except for the problems already mentioned with versions and not
> working on non-NTFS (not sure how it fails on non-NTFS, though, maybe
> it does a full flush, [4] doesn't say).

So my impression is that today we ship defaults that are unsafe on
Windows. I don't really understand much of what you are saying here,
but if there's a way we can stop doing that, +1 from me, especially if
it allows us to retain reasonable performance.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Windows now has fdatasync()

From
Thomas Munro
Date:
On Sat, Feb 5, 2022 at 2:10 AM Robert Haas <robertmhaas@gmail.com> wrote:
> So my impression is that today we ship defaults that are unsafe on
> Windows. I don't really understand much of what you are saying here,
> but if there's a way we can stop doing that, +1 from me, especially if
> it allows us to retain reasonable performance.

The PostgreSQL default in combination with the Windows default is
unsafe on SATA drives, but <get-out-clause>if you read our
documentation carefully you might figure out that you need to disable
write caching on your disk</>.
https://www.postgresql.org/docs/14/wal-reliability.html says:

"Consumer-grade IDE and SATA drives are particularly likely to have
write-back caches that will not survive a power failure. Many
solid-state drives (SSD) also have volatile write-back caches. [...]
On Windows, if wal_sync_method is open_datasync (the default), write
caching can be disabled by unchecking My Computer\Open\disk
drive\Properties\Hardware\Properties\Policies\Enable write caching on
the disk. Alternatively, set wal_sync_method to fsync or
fsync_writethrough, which prevent write caching."

I'm not proposing we change our default to this new level, because it
doesn't work on non-NTFS, an annoying complication.  This patch would
just provide something faster to put after "Alternatively".

(Actually that whole page needs a refresh.  IDE is gone.  The
discussion about "recent" support for flushing caches is a bit out of
date, and in fact the problem with open_datasync on this OS is because
of problems with drivers and
https://en.wikipedia.org/wiki/Disk_buffer#Force_Unit_Access_(FUA), not
FLUSH CACHE EXT.)

Here's an updated patch that fixes some silly problems seen on CI.
There's something a bit clunkly/weird about this HAVE_FDATASYNC stuff,
maybe I can find a tidier way, but it's enough for experimentation:

For Mingw, I unconditionally add src/port/fdatasync.o to LIBOBJS, and
I unconditionally #define HAVE_FDATASYNC in win32_port.h, and I also
changed c.h's declaration of fdatasync() because it comes before
port.h is included (I guess I could move it down instead?).

For MSVC, I unconditionally add fdatasync.o to @pgportfiles, and
HAVE_FDATASYNC is defined in Solution.pm.

It'd be interesting to see pg_test_fsync.exe output on real hardware.
Here's what a little Windows 10 VM with a virtual SATA drive says:

C:\Users\thmunro>c:\pg\bin\pg_test_fsync.exe
5 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.

Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync is Linux's default)
        open_datasync (direct)             7914.872 ops/sec     126 usecs/op
        open_datasync (buffered)           6593.056 ops/sec     152 usecs/op
        fdatasync                           650.317 ops/sec    1538 usecs/op
        fsync                               512.423 ops/sec    1952 usecs/op
        fsync_writethrough                  550.881 ops/sec    1815 usecs/op
        open_sync (direct)                              n/a
        open_sync (buffered)                            n/a

Attachment

Re: Windows now has fdatasync()

From
Robert Haas
Date:
On Fri, Feb 4, 2022 at 4:24 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> I'm not proposing we change our default to this new level, because it
> doesn't work on non-NTFS, an annoying complication.  This patch would
> just provide something faster to put after "Alternatively".

Hmm. I thought NTFS had kind of won the filesystem war on the Windows
side of things. No?

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Windows now has fdatasync()

From
Thomas Munro
Date:
On Sat, Feb 5, 2022 at 12:54 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Feb 4, 2022 at 4:24 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> > I'm not proposing we change our default to this new level, because it
> > doesn't work on non-NTFS, an annoying complication.  This patch would
> > just provide something faster to put after "Alternatively".
>
> Hmm. I thought NTFS had kind of won the filesystem war on the Windows
> side of things. No?

Against FAT, yes, but there are also SMB/CIFS (network) and the new
ReFS (which we recently broke and then unbroke[1]).  I haven't tried
those things, lacking general Windows-fu, but I suppose they'd reject
this and we'd panic, because the docs say "file systems supported:
NTFS"[2].

[1] https://www.postgresql.org/message-id/flat/16854-905604506e23d5c0%40postgresql.org
[2] https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/nf-ntifs-ntflushbuffersfileex



Re: Windows now has fdatasync()

From
Michael Paquier
Date:
On Sun, Dec 12, 2021 at 03:48:10PM +1300, Thomas Munro wrote:
> I tried out a quick POC patch and it runs a bit faster than fsync(), as
> expected.

Good news, as a too high difference would be suspect :)

How much difference does it make in % and are the numbers rather
reproducible?  Just wondering..
--
Michael

Attachment

Re: Windows now has fdatasync()

From
Thomas Munro
Date:
On Sun, Feb 6, 2022 at 7:20 PM Michael Paquier <michael@paquier.xyz> wrote:
> On Sun, Dec 12, 2021 at 03:48:10PM +1300, Thomas Munro wrote:
> > I tried out a quick POC patch and it runs a bit faster than fsync(), as
> > expected.
>
> Good news, as a too high difference would be suspect :)
>
> How much difference does it make in % and are the numbers rather
> reproducible?  Just wondering..

I've only tested on a qemu/kvm virtual machine with a virtual SATA
disk device, so take this with a bucket of salt, but I think that's
enough to see the impact of 'slow' SATA commands hitting the device
and being waited for, and what I see is that wal_sync_method=fdatasync
does about 25% more TPS than wal_sync_method=fsync, and
wal_sync_method=open_datasync is a wildly higher number that I don't
believe (ie I don't believe it waited for power loss durability and
the links above support that understanding), but tumbles back to earth
and almost exactly matches the wal_sync_method=fdatasync number when
the write cache is disabled.



Re: Windows now has fdatasync()

From
Thomas Munro
Date:
I've bumped this to the next cycle, so I can hopefully skip the
missing version detection stuff that I have no way to test (no CI, no
build farm, and I have zero interest in dumpster diving for Windows 7
or whatever installations).

I propose that we drop support for Windows versions older than
10/Server 2016 in the PostgreSQL 16 cycle, because the OS patches for
everything older come to an end in October next year[1], and we have a
lot of patches relating to modern Windows features that stall on
details about old systems that no one actually has.

[1] https://en.wikipedia.org/wiki/List_of_Microsoft_Windows_versions



Re: Windows now has fdatasync()

From
Michael Paquier
Date:
On Fri, Apr 08, 2022 at 02:56:15PM +1200, Thomas Munro wrote:
> I propose that we drop support for Windows versions older than
> 10/Server 2016 in the PostgreSQL 16 cycle, because the OS patches for
> everything older come to an end in October next year[1], and we have a
> lot of patches relating to modern Windows features that stall on
> details about old systems that no one actually has.
>
> [1] https://en.wikipedia.org/wiki/List_of_Microsoft_Windows_versions

Do you think that we could raise the minimum C standard on WIN32 to
C11, at least for MSVC?  There is a patch floating around to add
pg_attribute_aligned() and perhaps pg_attribute_noreturn() for MSVC:
https://www.postgresql.org/message-id/Yk6UgCGlZKuxRr4n@paquier.xyz

noreturn() needed at least C11:
https://docs.microsoft.com/en-us/cpp/c-language/noreturn?view=msvc-140

Perhaps we'd better also bump up the minimum version of MSVC
supported..
--
Michael

Attachment

Re: Windows now has fdatasync()

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> On Fri, Apr 08, 2022 at 02:56:15PM +1200, Thomas Munro wrote:
>> I propose that we drop support for Windows versions older than
>> 10/Server 2016 in the PostgreSQL 16 cycle,

Do we have any data on what people are actually using?

> Do you think that we could raise the minimum C standard on WIN32 to
> C11, at least for MSVC?

As long as the C11-isms are in MSVC-only code, it seems like this is
exactly equivalent to setting a minimum MSVC version.  I don't see
an objection-in-principle there, it's just a practical question of
how far back is reasonable to support MSVC versions.  (That's very
distinct from how far back we need the built code to run.)

            regards, tom lane



Re: Windows now has fdatasync()

From
Michael Paquier
Date:
On Fri, Apr 08, 2022 at 12:40:55AM -0400, Tom Lane wrote:
> As long as the C11-isms are in MSVC-only code, it seems like this is
> exactly equivalent to setting a minimum MSVC version.  I don't see
> an objection-in-principle there, it's just a practical question of
> how far back is reasonable to support MSVC versions.  (That's very
> distinct from how far back we need the built code to run.)

Good question.  Older versions of VS are available, so this is not a
problem:
https://visualstudio.microsoft.com/vs/older-downloads/

I think that we should at least drop 2013, as there is a bunch of
stuff related to _MSC_VER < 1900 that could be removed with that,
particularly for locales.
--
Michael

Attachment

Re: Windows now has fdatasync()

From
Dave Page
Date:


On Fri, 8 Apr 2022 at 05:41, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Michael Paquier <michael@paquier.xyz> writes:
> On Fri, Apr 08, 2022 at 02:56:15PM +1200, Thomas Munro wrote:
>> I propose that we drop support for Windows versions older than
>> 10/Server 2016 in the PostgreSQL 16 cycle,

Do we have any data on what people are actually using?

None that I know of. Anecdotally, we dropped support for pgAdmin on Windows < 8 (2012 for the server edition), and had a single complaint - and the user happily acknowledged they were on an old release and expected support to be dropped sooner or later. Windows 8 was a pretty unpopular release, so I would expect shifting to 10/2016+ for PG 16 would be unlikely to be a major problem.

FWIW, Python dropped support for < 8/2012 with v3.9.
 

> Do you think that we could raise the minimum C standard on WIN32 to
> C11, at least for MSVC?

As long as the C11-isms are in MSVC-only code, it seems like this is
exactly equivalent to setting a minimum MSVC version.  I don't see
an objection-in-principle there, it's just a practical question of
how far back is reasonable to support MSVC versions.  (That's very
distinct from how far back we need the built code to run.)

                        regards, tom lane




--

Re: Windows now has fdatasync()

From
Thomas Munro
Date:
On Fri, Apr 8, 2022 at 7:56 PM Dave Page <dpage@pgadmin.org> wrote:
> Windows 8 was a pretty unpopular release, so I would expect shifting to 10/2016+ for PG 16 would be unlikely to be a
majorproblem.
 

Thanks to Michael for making that happen.  That removes the main thing
I didn't know how to deal with in this patch.  Here's a rebase with
some cleanup.

With my garbage collector hat on, I see that all systems we target
have fdatasync(), except:

1.  Windows, but this patch supplies src/port/fdatasync.c.
2.  DragonflyBSD before 6.1.  We have 6.0 in the build farm.
3.  Ancient macOS.  Current releases have it, though we have to cope
with a missing declaration.

From a standards point of view, fdatasync() is issue 5 POSIX like
fsync().  Both are optional, but, being a database, we require
fsync(), and they're both covered by the same POSIX option
"Synchronized Input and Output".

My plan now is to commit this patch so that problem #1 is solved, prod
conchuela's owner to upgrade to solve #2, and wait until Tom shuts
down prairiedog to solve #3.  Then we could consider removing the
HAVE_FDATASYNC probe and associated #ifdefs when convenient.  For that
reason, I'm not too bothered about the slight weirdness of defining
HAVE_FDATASYNC on Windows even though that doesn't come from
configure; it'd hopefully be short-lived.  Better ideas welcome,
though.  Does that make sense?

Attachment

Re: Windows now has fdatasync()

From
Tom Lane
Date:
Thomas Munro <thomas.munro@gmail.com> writes:
> With my garbage collector hat on, I see that all systems we target
> have fdatasync(), except:

> 1.  Windows, but this patch supplies src/port/fdatasync.c.
> 2.  DragonflyBSD before 6.1.  We have 6.0 in the build farm.
> 3.  Ancient macOS.  Current releases have it, though we have to cope
> with a missing declaration.

Hmmm ... according to [1], while current macOS has an undocumented
fdatasync function, it doesn't seem to do anything as useful as,
say, sync data to disk.  I'm not sure what direction you're headed
in here, but it probably shouldn't include assuming that fdatasync
is actually useful on macOS.  But maybe that's not your point?

> My plan now is to commit this patch so that problem #1 is solved, prod
> conchuela's owner to upgrade to solve #2, and wait until Tom shuts
> down prairiedog to solve #3.

You could force my hand by pushing something that requires this ;-).
I'm not feeling particularly wedded to prairiedog per se.  As with
my once-and-future HPPA machine, I'd prefer to wait until NetBSD 10
is a thing before spinning up an official buildfarm animal, but
I suppose that that's not far away.

            regards, tom lane

[1] https://www.postgresql.org/message-id/1673109.1610733352%40sss.pgh.pa.us



Re: Windows now has fdatasync()

From
Thomas Munro
Date:
On Mon, Jul 18, 2022 at 3:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > With my garbage collector hat on, I see that all systems we target
> > have fdatasync(), except:
>
> > 1.  Windows, but this patch supplies src/port/fdatasync.c.
> > 2.  DragonflyBSD before 6.1.  We have 6.0 in the build farm.
> > 3.  Ancient macOS.  Current releases have it, though we have to cope
> > with a missing declaration.
>
> Hmmm ... according to [1], while current macOS has an undocumented
> fdatasync function, it doesn't seem to do anything as useful as,
> say, sync data to disk.  I'm not sure what direction you're headed
> in here, but it probably shouldn't include assuming that fdatasync
> is actually useful on macOS.  But maybe that's not your point?

Oh, I'm not planning to change the default choice on macOS (or
Windows).  I *am* assuming we're not going to take it away as an
option, that cat being out of the bag ever since configure found
Apple's secret fdatasync (note that O_DSYNC, our default, is also
undocumented and also known not to flush caches, but at least it's
present in an Apple header!).  I was just noting an upcoming
opportunity to remove the configure/meson probes for fdatasync, which
made me feel better about the slightly kludgy way this patch is
defining HAVE_FDATASYNC explicitly on Windows.

> > My plan now is to commit this patch so that problem #1 is solved, prod
> > conchuela's owner to upgrade to solve #2, and wait until Tom shuts
> > down prairiedog to solve #3.
>
> You could force my hand by pushing something that requires this ;-).

Heh.  Let me ask about the DragonFlyBSD thing first.



Re: Windows now has fdatasync()

From
Tom Lane
Date:
Thomas Munro <thomas.munro@gmail.com> writes:
> ... I was just noting an upcoming
> opportunity to remove the configure/meson probes for fdatasync, which
> made me feel better about the slightly kludgy way this patch is
> defining HAVE_FDATASYNC explicitly on Windows.

Hm.  There is certainly not any harm in the meson infrastructure
skipping that test, because prairiedog is not able to run meson
anyway.  Can we do that and still leave it in place on the autoconf
side?  Maybe not, because I suppose you want to remove #ifdefs in
the code itself.

I see that fdatasync goes back as far as SUS v2, which we've long
taken as our minimum POSIX infrastructure.  So there's not a lot
of room to insist that we should support allegedly-Unix platforms
without it.

            regards, tom lane



Re: Windows now has fdatasync()

From
Michael Paquier
Date:
On Mon, Jul 18, 2022 at 03:26:36PM +1200, Thomas Munro wrote:
> My plan now is to commit this patch so that problem #1 is solved, prod
> conchuela's owner to upgrade to solve #2, and wait until Tom shuts
> down prairiedog to solve #3.  Then we could consider removing the
> HAVE_FDATASYNC probe and associated #ifdefs when convenient.  For that
> reason, I'm not too bothered about the slight weirdness of defining
> HAVE_FDATASYNC on Windows even though that doesn't come from
> configure; it'd hopefully be short-lived.  Better ideas welcome,
> though.  Does that make sense?

Do you still need HAVE_DECL_FDATASYNC?
--
Michael

Attachment

Re: Windows now has fdatasync()

From
Thomas Munro
Date:
On Tue, Jul 19, 2022 at 4:54 PM Michael Paquier <michael@paquier.xyz> wrote:
> Do you still need HAVE_DECL_FDATASYNC?

I guess so, because that is currently used for macOS, and with this
patch would also be used to control the declaration for Windows.  The
alternative would be to explicitly test for WIN32 or __darwin__.

The reason we need it for macOS is that they have had fdatasync
function for many years now, and configure detects it, but they
haven't ever declared it in a header, so we (accidentally?) do it in
c.h.  We didn't set that up for Apple!  The commit that added it was
33cc5d8a, which was about a month before Apple shipped the first
version of OS X (and long before they defined the function).  So there
must have been another Unix with that problem, lost in the mists of
time.



Re: Windows now has fdatasync()

From
Tom Lane
Date:
Thomas Munro <thomas.munro@gmail.com> writes:
> The reason we need it for macOS is that they have had fdatasync
> function for many years now, and configure detects it, but they
> haven't ever declared it in a header, so we (accidentally?) do it in
> c.h.  We didn't set that up for Apple!  The commit that added it was
> 33cc5d8a, which was about a month before Apple shipped the first
> version of OS X (and long before they defined the function).  So there
> must have been another Unix with that problem, lost in the mists of
> time.

It might have just been paranoia, but I doubt it.  Back then we
were still dealing with lots of systems that didn't have every
function described in SUS v2.

If you poked around in the mail archives you could likely find some
associated discussion, but I'm too lazy for that ...

            regards, tom lane



Re: Windows now has fdatasync()

From
Thomas Munro
Date:
Ok, I've pushed the Windows patch.  I'll watch the build farm to see
if I've broken any of the frankentoolchain Windows animals.

Mikael kindly upgraded conchuela, so that leaves just prairiedog
without fdatasync.  I've attached a patch to drop the configure probe
for that once prairiedog's host is reassigned to new duties, if we're
agreed on that.

While in this part of the code I noticed another anachronism that
could be cleaned up: our handling of the old pre-standard BSD O_FSYNC
flag.  Pulling on that I noticed I could remove a bunch of associated
macrology.

Attachment

Re: Windows now has fdatasync()

From
David Rowley
Date:
On Wed, 20 Jul 2022 at 15:22, Thomas Munro <thomas.munro@gmail.com> wrote:
> Ok, I've pushed the Windows patch.  I'll watch the build farm to see
> if I've broken any of the frankentoolchain Windows animals.

Just to get in there before the farm does... I just got a boatload of
redefinition of HAVE_FDATASYNC warnings.  I see it already gets
defined in pg_config.h

All compiles cleanly with the attached.

David

Attachment

Re: Windows now has fdatasync()

From
Thomas Munro
Date:
On Wed, Jul 20, 2022 at 4:08 PM David Rowley <dgrowleyml@gmail.com> wrote:
> On Wed, 20 Jul 2022 at 15:22, Thomas Munro <thomas.munro@gmail.com> wrote:
> > Ok, I've pushed the Windows patch.  I'll watch the build farm to see
> > if I've broken any of the frankentoolchain Windows animals.
>
> Just to get in there before the farm does... I just got a boatload of
> redefinition of HAVE_FDATASYNC warnings.  I see it already gets
> defined in pg_config.h
>
> All compiles cleanly with the attached.

Oops.  Thanks, pushed.



Re: Windows now has fdatasync()

From
Thomas Munro
Date:
On Wed, Jul 20, 2022 at 4:14 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Wed, Jul 20, 2022 at 4:08 PM David Rowley <dgrowleyml@gmail.com> wrote:
> > On Wed, 20 Jul 2022 at 15:22, Thomas Munro <thomas.munro@gmail.com> wrote:
> > > Ok, I've pushed the Windows patch.  I'll watch the build farm to see
> > > if I've broken any of the frankentoolchain Windows animals.
> >
> > Just to get in there before the farm does... I just got a boatload of
> > redefinition of HAVE_FDATASYNC warnings.  I see it already gets
> > defined in pg_config.h
> >
> > All compiles cleanly with the attached.
>
> Oops.  Thanks, pushed.

... and here's a rebase of the patch that removes that macro stuff,
since cfbot is watching this thread.

Attachment

Re: Windows now has fdatasync()

From
Thomas Munro
Date:
Hearing no objection, I committed the patch to remove O_FSYNC.  The
next cleanup one I'll just leave here for now.

Attachment

Re: Windows now has fdatasync()

From
Thomas Munro
Date:
David kindly ran some tests of this thing on real hardware.  The
results were mostly in line with expectations, but we learned some new
things.

TL;DR  We probably should consider this as a safer default, but it'd
be good for someone more hands-on with this OS and knowledgeable about
storage to investigate and propose that.  My original goal here was
primarily Unix/Windows harmonisation and cleanup since I'm doing a
bunch of hacking on I/O, but I can't unsee an
unsafe-at-least-on-consumer-gear default now that I've seen it.  The
main thing I'm aware of that we don't know yet is what happens if you
try it on a non-NTFS file system (ReFS? SMB?) -- hopefully it falls
back to fsync behaviour.

Observations from an old Windows 8.1 system with a SATA drive:

1.  So far you can apparently still actually compile and run on 8.1,
despite recent commits to de-support it.
2.  You can use the new wal_sync_method=fdatasync, without error, and
timings are consistent with falling back to full fsync behaviour.
That makes sense, I guess, because the function existed.  It's just a
new flag bit, and the default behaviour for flags == 0 was already
their fsync.  That seems like a good outcome even though 8.1 isn't a
target anymore.

Observations from a current Windows 11 system with an NVMe drive:

1.  fdatasync is faster than fsync, as expected.  Twice as fast with
write cache disabled, a bit faster with write cache enabled.
2.  Timings seem to suggest that open_datasync (the current default)
is not really writing through the drive cache.  I'd previously thought
that was a SATA-only problem based on [1], which said that EIDE/SATA
drivers did not pass through the FUA flag that NTFS sends for
FILE_FLAG_WRITE_THROUGH (= O_DSYNC) on the basis that many drives
ignored it anyway, but these numbers seem to suggest that David's
recent-ish NVMe system has the same problem as the old SATA system.

Generally, Windows' approach seems to be that NTFS
FILE_FLAG_WRITE_THROUGH fires an FUA flag into the storage stack, and
either the driver or the drive is free to fling it out the window, and
it's the user's problem to worry about that, whereas Linux at least
asks nicely if the drive understands FUA and falls back to flushing
the whole cache if not[2].  I also know that Linux has been flaky
around this in the past too, especially on consumer storage, and macOS
and at least some of the older BSD/UFS systems just don't do this
stuff at all for user data (yet) so it's not like there is anything
universal about this topic.  Note that drive caches are enabled by
default in Windows, and our manual does already tell you about this
problem[3].

One thing to note about the numbers below: pg_test_fsync.c's
open_datasync test is also using FILE_FLAG_NO_BUFFERING (= O_DIRECT),
unlike PostgreSQL, which muddies the waters slightly.  (There was a
patch upthread to fix that and report both numbers, I may come back to
that.)

Windows 11, NVMe, write cache enabled:

        open_datasync                     27306.286 ops/sec      37 usecs/op
        fdatasync                          3065.428 ops/sec     326 usecs/op
        fsync                              2577.498 ops/sec     388 usecs/op

Windows 11, NVMe, write cache disabled:

        open_datasync                      3477.258 ops/sec     288 usecs/op
        fdatasync                          3263.418 ops/sec     306 usecs/op
        fsync                              1641.502 ops/sec     609 usecs/op

Windows 8.1, SATA:

        open_datasync                     19934.532 ops/sec      50 usecs/op
        fdatasync                           231.429 ops/sec    4321 usecs/op
        fsync                               240.050 ops/sec    4166 usecs/op

(We couldn't figure out how to disable the write cache on the 8.1
machine -- the usual checkbox had no effect -- but we didn't waste
time investigating that old system beyond the curiosity of checking if
it'd work at all.)

[1] https://devblogs.microsoft.com/oldnewthing/20170510-00/?p=95505
[2]
https://techcommunity.microsoft.com/t5/sql-server-blog/sql-server-on-linux-forced-unit-access-fua-internals/ba-p/3199102
[3] https://www.postgresql.org/docs/devel/wal-reliability.html