Thread: wal_sync_method=fsync_writethrough
Hi, We allow $SUBJECT on Windows. I'm not sure exactly how we finished up with that, maybe a historical mistake, but I find it misleading today. Modern Windows flushes drive write caches for fsync (= _commit()) and fdatasync (= FLUSH_FLAGS_FILE_DATA_SYNC_ONLY). In fact it is possible to tell Windows to write out file data without flushing the drive cache (= FLUSH_FLAGS_NO_SYNC), but I don't believe anyone is interested in new weaker levels. Any reason not to just get rid of it? On macOS, our fsync and fdatasync levels *don't* flush drive caches, because those system calls don't on that OS, and they offer a weird special fcntl, so there we offer $SUBJECT for a good reason. Now that macOS 10.2 systems are thoroughly extinct, I think we might as well drop the configure probe, though, while we're doing a lot of that sort of thing. The documentation also says a couple of things that aren't quite correct about wal_sync_level. (I would also like to revise other nearby outdated paragraphs about volatile write caches, sector sizes etc, but that'll take some more research.)
Attachment
On Fri, Aug 26, 2022 at 6:55 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > Hi, > > We allow $SUBJECT on Windows. I'm not sure exactly how we finished up > with that, maybe a historical mistake, but I find it misleading today. > Modern Windows flushes drive write caches for fsync (= _commit()) and > fdatasync (= FLUSH_FLAGS_FILE_DATA_SYNC_ONLY). In fact it is possible > to tell Windows to write out file data without flushing the drive > cache (= FLUSH_FLAGS_NO_SYNC), but I don't believe anyone is > interested in new weaker levels. Any reason not to just get rid of > it? So, I don't know how it works now, but the history at least was this: it was not about the disk caches, it was about raid controller caches. Basically, we determined that windows didn't fsync it all the way. But it would with But if we changed wal_sync_method=fsync to actually *do* that, then people who had paid big money for raid controllers with flash or battery backed cache would lose a ton of performance. So we needed one level that would sync out of the OS but not through the RAID cache, and another one that would sync it out of the RAID cache as well. Which would/could be different from the drive caches themselves, and they often behaved differently. And I think it may have even been dependent on the individual RAID drivers what the default would be. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
On Sat, Aug 27, 2022 at 12:17 AM Magnus Hagander <magnus@hagander.net> wrote: > So, I don't know how it works now, but the history at least was this: > it was not about the disk caches, it was about raid controller caches. > Basically, we determined that windows didn't fsync it all the way. But > it would with But if we changed wal_sync_method=fsync to actually > *do* that, then people who had paid big money for raid controllers > with flash or battery backed cache would lose a ton of performance. So > we needed one level that would sync out of the OS but not through the > RAID cache, and another one that would sync it out of the RAID cache > as well. Which would/could be different from the drive caches > themselves, and they often behaved differently. And I think it may > have even been dependent on the individual RAID drivers what the > default would be. Thanks for the background. Yeah, that makes sense to motivate open_datasync for Windows. Not sure what you meant about fsync or meant to write after "would with". It seems like the 2005 discussions were primarily about open_datasync but also had the by-product of introducing the name fsync_writethrough. If I'm reading between the lines[1] correctly, perhaps the logic went like this: 1. We noticed that _commit() AKA FlushFileBuffers() issued SYNCHRONIZE CACHE (or equivalent) on Windows. 2. At that time in history, Linux (and other Unixes) probably did not issue SYNCHRONIZE CACHE when you called fsync()/fdatasync(). 3. We concluded therefore that Windows was strange and we needed to use a different level name for the setting to reflect this extra effect. Now it looks strange: we have both "fsync" and "fsync_writethrough" doing exactly the same thing while vaguely implying otherwise, and the contrast with other operating systems (if I divined that aspect correctly) mostly doesn't apply. How flush commands affect various caches in modern storage stacks is also not really OS-specific AFAIK. (Obviously macOS is a different story...) [1] https://www.postgresql.org/message-id/flat/26109.1111084860%40sss.pgh.pa.us#e7f8c2e14d76cad76b1857e89c8a6314
On Fri, Aug 26, 2022 at 11:29 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > On Sat, Aug 27, 2022 at 12:17 AM Magnus Hagander <magnus@hagander.net> wrote: > > So, I don't know how it works now, but the history at least was this: > > it was not about the disk caches, it was about raid controller caches. > > Basically, we determined that windows didn't fsync it all the way. But > > it would with But if we changed wal_sync_method=fsync to actually > > *do* that, then people who had paid big money for raid controllers > > with flash or battery backed cache would lose a ton of performance. So > > we needed one level that would sync out of the OS but not through the > > RAID cache, and another one that would sync it out of the RAID cache > > as well. Which would/could be different from the drive caches > > themselves, and they often behaved differently. And I think it may > > have even been dependent on the individual RAID drivers what the > > default would be. > > Thanks for the background. Yeah, that makes sense to motivate > open_datasync for Windows. Not sure what you meant about fsync or > meant to write after "would with". That's a good question indeed :) I think I meant it would with FILE_FLAG_WRITE_THROUGH. > It seems like the 2005 discussions were primarily about open_datasync > but also had the by-product of introducing the name > fsync_writethrough. If I'm reading between the lines[1] correctly, > perhaps the logic went like this: > > 1. We noticed that _commit() AKA FlushFileBuffers() issued > SYNCHRONIZE CACHE (or equivalent) on Windows. > > 2. At that time in history, Linux (and other Unixes) probably did not > issue SYNCHRONIZE CACHE when you called fsync()/fdatasync(). I think it may have been driver dependent there (as well), at the time. > 3. We concluded therefore that Windows was strange and we needed to > use a different level name for the setting to reflect this extra > effect. It was certainly strange to us :) > Now it looks strange: we have both "fsync" and "fsync_writethrough" > doing exactly the same thing while vaguely implying otherwise, and the > contrast with other operating systems (if I divined that aspect > correctly) mostly doesn't apply. How flush commands affect various > caches in modern storage stacks is also not really OS-specific AFAIK. > > (Obviously macOS is a different story...) Given that it does vary (because macOS is actually an OS :D), we might need to start from a matrix of exactly what happens in different states, and then try to map that to a set? I fully agree that if things actually behave the same, they should be called the same. And it may also be that there is no longer a difference between direct-drive and RAID-with-battery-or-flash, which used to be the huge difference back then, where you had to tune for it. For many cases that has been negated by just not using that (and using NVME and possibly software raid instead), but there are certainly still people using such systems... //Magnus
On Tue, Aug 30, 2022 at 3:44 AM Magnus Hagander <magnus@hagander.net> wrote: > On Fri, Aug 26, 2022 at 11:29 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > Now it looks strange: we have both "fsync" and "fsync_writethrough" > > doing exactly the same thing while vaguely implying otherwise, and the > > contrast with other operating systems (if I divined that aspect > > correctly) mostly doesn't apply. How flush commands affect various > > caches in modern storage stacks is also not really OS-specific AFAIK. > > > > (Obviously macOS is a different story...) > > Given that it does vary (because macOS is actually an OS :D), we might > need to start from a matrix of exactly what happens in different > states, and then try to map that to a set? I fully agree that if > things actually behave the same, they should be called the same. Thanks, I'll take that as a +1 for dropping the redundant level for Windows. (Of course it stays for macOS). I like that our current levels are the literal names of standard interfaces we call, since the rest is out of our hands. I'm not sure what you could actually *do* with the information that some OS doesn't flush write caches, other than document it and suggest a remedy (e.g. turn it off). I would even prefer it if fsync_writethrough were called F_FULLFSYNC, following that just-say-what-it-does-directly philosophy, but that horse is already over the horizon. > And it may also be that there is no longer a difference between > direct-drive and RAID-with-battery-or-flash, which used to be the huge > difference back then, where you had to tune for it. For many cases > that has been negated by just not using that (and using NVME and > possibly software raid instead), but there are certainly still people > using such systems... I believe modern systems are a lot better at negotiating the need for flushes (i.e. for *volatile* caches). In contrast, the FUA situation (as used for FILE_FLAG_WRITE_THROUGH) seems like a multi-level dumpster fire on anything but high-end gear, from what I've been able to figure out so far, though I'm no expert.