Thread: fsync vs open_sync
I did a little test on the various options of fsync. I'm not sure my tests are scientific enough for general publication or evaluation, all I am doing is performaing a loop that inserts a value into a table 1 million times. create table testndx (value integer, name varchar); create index testndx_val on testndx (value); for(int i=0; i < 1000000; i++) { printf_query( "insert into testndx (value, name) values ('%d', 'test')", random()); // report here } Anyway, with fsync enabled using standard fsync(), I get roughly 300-400 inserts per second. With fsync disabled, I get about 7000 inserts per second. When I re-enable fsync but use the open_sync option, I can get about 2500 inserts per second. (This is on Linux 2.4 kernel, ext2 file system) (1) Is there any drawback to using open_sync as it appears to be a happy medium to turing fsync off? (2) Does anyone know if the "open_sync" option performs this well across most platforms or only Linux? (3) If "open_sync" works well across many platforms, and there are no drawbacks, shouldn't it be the default wal sync method? The performance bood is increadible.
pgsql@mohawksoft.com writes: > I did a little test on the various options of fsync. There were considerably more extensive tests back when we created the different WAL options, and the conclusions seemed to be that the best choice is platform-dependent and also usage-dependent. (In particular, it makes a huge difference whether WAL has its own drive or not.) I don't really recall why open_sync didn't end up among the set of choices considered for the default setting. It may be that we need to reconsider based on the behavior of newer Linux versions ... In any case, comparing open_sync to fsync is irrelevant, seeing that the current default choice on Linux is fdatasync. What you ought to be telling us about is the performance relative to that. regards, tom lane
> pgsql@mohawksoft.com writes: >> I did a little test on the various options of fsync. > > There were considerably more extensive tests back when we created the > different WAL options, and the conclusions seemed to be that the best > choice is platform-dependent and also usage-dependent. (In particular, > it makes a huge difference whether WAL has its own drive or not.) > > I don't really recall why open_sync didn't end up among the set of > choices considered for the default setting. It may be that we need to > reconsider based on the behavior of newer Linux versions ... > > In any case, comparing open_sync to fsync is irrelevant, seeing that > the current default choice on Linux is fdatasync. What you ought to > be telling us about is the performance relative to that. I can tell you, and I'll send all the results if you like, but fsync and fdatasync are, as far as I can tell, idenitical. In fact, I can't find any documentation that fdatasync is no longer implemented on Linux as fsync. I tested fsync and fdatasync first and in my tests, the performance of fdatasync and fsync were the same. I never went beyond these as it looked like the fsync options were all basically the same. I hadn't read anywhere where open_sync could make such a difference. It is only because of some idle chatter (over a few years) I read in a couple Linux kernel mailing list about O_SYNC being improved, that I thought I'd try it. The improvements were REALLY astounding, and I would like to know if other Linux users see this performance increase, I mean, it is almost 8~10 times faster than using fsync. Furthermore, it seems to also have the added benefit of reducing the I/O storm at checkpoints over a system running with fsync off. I'm really serious about this, changing this one parameter had dramatic results on performance. We should have a general call to users to test this setting with their OS of choice. If not that, if we can be sure that there are no cases where using O_SYNC is worse than fsync() or fdatasync(), it should be considered as the default.
pgsql@mohawksoft.com wrote: > Furthermore, it seems to also have the added benefit of reducing the I/O > storm at checkpoints over a system running with fsync off. > > I'm really serious about this, changing this one parameter had dramatic > results on performance. We should have a general call to users to test > this setting with their OS of choice. If not that, if we can be sure that > there are no cases where using O_SYNC is worse than fsync() or > fdatasync(), it should be considered as the default. Agreed. Have you looked at src/tools/fsync? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
pgsql@mohawksoft.com writes: > The improvements were REALLY astounding, and I would like to know if other > Linux users see this performance increase, I mean, it is almost 8~10 times > faster than using fsync. > Furthermore, it seems to also have the added benefit of reducing the I/O > storm at checkpoints over a system running with fsync off. What size transactions are you using in your tests? For a system with small transactions (not much more than 1 page worth of WAL traffic per transaction) I'd be pretty surprised if there was any real difference at all. There certainly should not be any difference in terms of the number of physical writes. We have seen some platforms where fsync() is inefficiently implemented and requires more kernel overhead than is reasonable --- not for I/O, but just to look through the kernel buffers and confirm that none of them need flushing. But I didn't think Linux was one of these. regards, tom lane
Just out of interest, what happens to the difference if you use *ext3* (perhaps with data=writeback) regards Mark pgsql@mohawksoft.com wrote: >I did a little test on the various options of fsync. >... >create table testndx (value integer, name varchar); >create index testndx_val on testndx (value); > >for(int i=0; i < 1000000; i++) >{ > printf_query( "insert into testndx (value, name) values ('%d', 'test')", >random()); > > // report here >} > > >Anyway, with fsync enabled using standard fsync(), I get roughly 300-400 >inserts per second. With fsync disabled, I get about 7000 inserts per >second. When I re-enable fsync but use the open_sync option, I can get >about 2500 inserts per second. > >(This is on Linux 2.4 kernel, ext2 file system) > > > >
> Just out of interest, what happens to the difference if you use *ext3* > (perhaps with data=writeback) Actually, I was working for a client, so it wasn't a general exploritory, but I can say that early on we discovered that ext3 was about the worst file system for PostgreSQL. We gave up on it and decided to use ext2. I have been considering a full sweep in my test lab off client time later on. ext2, ext3, jfs, xfs, and ReiserFS, fsync on with fdatasync or open_sync, and fsync off. One million inserts with auto commit.
pgsql@mohawksoft.com writes: >> Just out of interest, what happens to the difference if you use *ext3* >> (perhaps with data=writeback) > > Actually, I was working for a client, so it wasn't a general exploritory, > but I can say that early on we discovered that ext3 was about the worst > file system for PostgreSQL. We gave up on it and decided to use ext2. I'd be interested in which ext3 mount options you used--I can see how anything other than 'data=writeback' could be a performance killer. I've been meaning to run a few tests myself, but haven't had the time... -Doug -- Let us cross over the river, and rest under the shade of the trees. --T. J. Jackson, 1863
Tom Lane wrote: >pgsql@mohawksoft.com writes: > > >>The improvements were REALLY astounding, and I would like to know if other >>Linux users see this performance increase, I mean, it is almost 8~10 times >>faster than using fsync. >>Furthermore, it seems to also have the added benefit of reducing the I/O >>storm at checkpoints over a system running with fsync off. >> >> > >What size transactions are you using in your tests? > >For a system with small transactions (not much more than 1 page worth of >WAL traffic per transaction) I'd be pretty surprised if there was any >real difference at all. There certainly should not be any difference in >terms of the number of physical writes. We have seen some platforms >where fsync() is inefficiently implemented and requires more kernel >overhead than is reasonable --- not for I/O, but just to look through >the kernel buffers and confirm that none of them need flushing. But I >didn't think Linux was one of these. > > > IDE or scsi? If IDE: Write cache on or off? Which 2.4 kernel? The numbers are very high - it could be a side effect of write caching by the disks. I think some Suse 2.4 kernels have partial support for reliable fsync even if the write cache is on (i.e. fsync issues a cache flush command to the disk), but not all code paths are handled. Perhaps fsync is handled and O_SYNC is not handled. I could try to find the details. -- Manfred
Some more information: I started to perform the tests on one of the machines in my lab, and guess what, almost no difference between fsync and open_sync. Either on jfs or ext2. The difference, Linux 2.6.3? My original tests where on Linux 2.4.25. The good part is that open_sync wasn't worse. Just a question about conceptually, "What is the right thing to do?" I started to think about this. To me, the O_SYNC flag is to ensure that what you write, at the time of write, is on the disk. In SQL terms it is like "auto commit." Calling fsync or fdatasync is so that one can batch write calls and flush it out to disk in one shot, conceptually, it is like transaction. Does it make sense, then, to say that WAL O_SYNC should be O_SYNC? If there are no reasons not too, doesn't it make sense to make this the default. It will give a boost for any 2.4 Linux machines and won't seem to hurt anyone else.
> > In particular, you need to offer some evidence for that completely > undocumented assertion that "it won't hurt anyone else". It should be easy enough to prove whether or not O_SYNC hurts anyone. OK, let me ask a few questions: (1) what is a good sample set on which to run? Linux, FreeBSD, MacIntosh? (2) What sort of tests would be definitive? Auto commit and some transactional load? After delving into this a little, it seems to me that if you are going to do this: write(file, buffer, size); f[data]sync(file); Opening with O_SYNC seems to be an optimization specifically to this methodology. At the very least, it will save one user/kernel transition. If we can prove beyond a reasonable doubt that using O_SYNC does not hurt any platform, then what reason would there be to continue making it the default? Again, conceptually, O_SYNC does what you want it to do, and should be able to do it more efficiently than fdatasync().
pgsql@mohawksoft.com writes: > Does it make sense, then, to say that WAL O_SYNC should be O_SYNC? If > there are no reasons not too, doesn't it make sense to make this the > default. It will give a boost for any 2.4 Linux machines and won't seem to > hurt anyone else. You have got the terms of debate backwards here. These decisions were already made once, on the basis of more testing than you have done (okay, it wasn't months worth of work, but we at least exercised a number of scenarios on a number of platforms). The question is not "why shouldn't we make this the default" but "why should we make this the default, and what are we likely to break if we do so?" Showing that one release series of one platform wins in one particular set of tests is not sufficient grounds for changing the default. In particular, you need to offer some evidence for that completely undocumented assertion that "it won't hurt anyone else". regards, tom lane
> On Tue, 2004-08-10 at 07:48, pgsql@mohawksoft.com wrote: >> Some more information: >> >> I started to perform the tests on one of the machines in my lab, and >> guess >> what, almost no difference between fsync and open_sync. Either on jfs or >> ext2. >> >> The difference, Linux 2.6.3? My original tests where on Linux 2.4.25. > Very hazy memory recalls something about O_SYNC not really doing > anything in early kernel versions. > >> >> The good part is that open_sync wasn't worse. In early Linux kernels, O_SYNC was implemented using fsync(), and there was an amount of debate about people using "O_SYNC" should see performance degradation. >> >> Just a question about conceptually, "What is the right thing to do?" I >> started to think about this. To me, the O_SYNC flag is to ensure that >> what >> you write, at the time of write, is on the disk. In SQL terms it is like >> "auto commit." Calling fsync or fdatasync is so that one can batch write >> calls and flush it out to disk in one shot, conceptually, it is like >> transaction. > With the caveat that the kernel can start flushing your data to disk > in the background, not just when you call fdatasync/fsync. I was speaking in conceptual terms, not exact ones. Just a general analogy. In theory, theory and practice are the same thing, in practice, they are not.
pgsql@mohawksoft.com wrote: >I have been considering a full sweep in my test lab off client time later on. > >ext2, ext3, jfs, xfs, and ReiserFS, fsync on with fdatasync or open_sync, >and fsync off. > > Before you start: double check that the disks are not lying: At least the suse 2.4 kernel send cache flush commands to ide disks on fsync(), but not with O_SYNC: http://marc.theaimsgroup.com/?l=linux-kernel&m=107964507113585 -- Manfred
pgsql@mohawksoft.com writes: > After delving into this a little, it seems to me that if you are going to > do this: > write(file, buffer, size); > f[data]sync(file); > Opening with O_SYNC seems to be an optimization specifically to this > methodology. What you are missing is that we don't necessarily do that. Writes and flushes of xlog don't always occur together: we may write out a buffer to make room in shared memory even though we do not yet need it flushed to disk. In this situation it is better *not* to have O_SYNC on because we don't need to force (and wait for) a write just then. With a little luck the kernel will write the buffer before we actually need a flush to occur, and so there will be no actual delaying for it at all. In particular this scenario applies for bulk-update transactions that create vast amounts of WAL traffic but don't need an fsync till the very end. regards, tom lane