Re: wal_sync_method=fsync_writethrough - Mailing list pgsql-hackers

From Magnus Hagander
Subject Re: wal_sync_method=fsync_writethrough
Date
Msg-id CABUevEz2_HuNJP+gTjaJ9uDvAaC4RUr35Q1+-6L2VX6RDB_AGw@mail.gmail.com
Whole thread Raw
In response to Re: wal_sync_method=fsync_writethrough  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: wal_sync_method=fsync_writethrough
List pgsql-hackers
On Fri, Aug 26, 2022 at 11:29 PM Thomas Munro <thomas.munro@gmail.com> wrote:
>
> On Sat, Aug 27, 2022 at 12:17 AM Magnus Hagander <magnus@hagander.net> wrote:
> > So, I don't know how it works now, but the history at least was this:
> > it was not about the disk caches, it was about raid controller caches.
> > Basically, we determined that windows didn't fsync it all the way. But
> > it would with  But if we changed wal_sync_method=fsync to actually
> > *do* that, then people who had paid big money for raid controllers
> > with flash or battery backed cache would lose a ton of performance. So
> > we needed one level that would sync out of the OS but not through the
> > RAID cache, and another one that would sync it out of the RAID cache
> > as well. Which would/could be different from the drive caches
> > themselves, and they often behaved differently. And I think it may
> > have even been dependent on the individual RAID drivers what the
> > default would  be.
>
> Thanks for the background.  Yeah, that makes sense to motivate
> open_datasync for Windows.  Not sure what you meant about fsync or
> meant to write after "would with".

That's a good question indeed :) I think I meant it would with
FILE_FLAG_WRITE_THROUGH.


> It seems like the 2005 discussions were primarily about open_datasync
> but also had the by-product of introducing the name
> fsync_writethrough.  If I'm reading between the lines[1] correctly,
> perhaps the logic went like this:
>
> 1.  We noticed that _commit() AKA FlushFileBuffers() issued
> SYNCHRONIZE CACHE (or equivalent) on Windows.
>
> 2.  At that time in history, Linux (and other Unixes) probably did not
> issue SYNCHRONIZE CACHE when you called fsync()/fdatasync().

I think it may have been driver dependent there (as well), at the time.


> 3.  We concluded therefore that Windows was strange and we needed to
> use a different level name for the setting to reflect this extra
> effect.

It was certainly strange to us :)


> Now it looks strange: we have both "fsync" and "fsync_writethrough"
> doing exactly the same thing while vaguely implying otherwise, and the
> contrast with other operating systems (if I divined that aspect
> correctly) mostly doesn't apply.  How flush commands affect various
> caches in modern storage stacks is also not really OS-specific AFAIK.
>
> (Obviously macOS is a different story...)

Given that it does vary (because macOS is actually an OS :D), we might
need to start from a matrix of exactly what happens in different
states, and then try to map that to a set? I fully agree that if
things actually behave the same, they should be called the same.

And it may also be that there is no longer a difference between
direct-drive and RAID-with-battery-or-flash, which used to be the huge
difference back then, where you had to tune for it. For many cases
that has been negated by just not using that (and using NVME and
possibly software raid instead), but there are certainly still people
using such systems...

//Magnus



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Reducing the chunk header sizes on all memory context types
Next
From: Andres Freund
Date:
Subject: Re: Reducing the chunk header sizes on all memory context types