Thread: fsync vs open_sync

fsync vs open_sync

From
pgsql@mohawksoft.com
Date:
I did a little test on the various options of fsync.

I'm not sure my tests are scientific enough for general publication or
evaluation, all I am doing is performaing a loop that inserts a value into
a table 1 million times.
create table testndx (value integer, name varchar);
create index testndx_val on testndx (value);

for(int i=0; i < 1000000; i++)
{ printf_query( "insert into testndx (value, name) values ('%d', 'test')",
random());
  // report here
}


Anyway, with fsync enabled using standard fsync(), I get roughly 300-400
inserts per second. With fsync disabled, I get about 7000 inserts per
second. When I re-enable fsync but use the open_sync option, I can get
about 2500 inserts per second.

(This is on Linux 2.4 kernel, ext2 file system)

(1) Is there any drawback to using open_sync as it appears to be a happy
medium to turing fsync off?
(2) Does anyone know if the "open_sync" option performs this well across
most platforms or only Linux?
(3) If "open_sync" works well across many platforms, and there are no
drawbacks, shouldn't it be the default wal sync method? The performance
bood is increadible.


Re: fsync vs open_sync

From
Tom Lane
Date:
pgsql@mohawksoft.com writes:
> I did a little test on the various options of fsync.

There were considerably more extensive tests back when we created the
different WAL options, and the conclusions seemed to be that the best
choice is platform-dependent and also usage-dependent.  (In particular,
it makes a huge difference whether WAL has its own drive or not.)

I don't really recall why open_sync didn't end up among the set of
choices considered for the default setting.  It may be that we need to
reconsider based on the behavior of newer Linux versions ...

In any case, comparing open_sync to fsync is irrelevant, seeing that
the current default choice on Linux is fdatasync.  What you ought to
be telling us about is the performance relative to that.
        regards, tom lane


Re: fsync vs open_sync

From
pgsql@mohawksoft.com
Date:
> pgsql@mohawksoft.com writes:
>> I did a little test on the various options of fsync.
>
> There were considerably more extensive tests back when we created the
> different WAL options, and the conclusions seemed to be that the best
> choice is platform-dependent and also usage-dependent.  (In particular,
> it makes a huge difference whether WAL has its own drive or not.)
>
> I don't really recall why open_sync didn't end up among the set of
> choices considered for the default setting.  It may be that we need to
> reconsider based on the behavior of newer Linux versions ...
>
> In any case, comparing open_sync to fsync is irrelevant, seeing that
> the current default choice on Linux is fdatasync.  What you ought to
> be telling us about is the performance relative to that.

I can tell you, and I'll send all the results if you like, but fsync and
fdatasync are, as far as I can tell, idenitical. In fact, I can't find any
documentation that fdatasync is no longer implemented on Linux as fsync.

I tested fsync and fdatasync first and in my tests, the performance of
fdatasync and fsync were the same. I never went beyond these as it looked
like the fsync options were all basically the same. I hadn't read anywhere
where open_sync could make such a difference. It is only because of some
idle chatter (over a few years) I read in a couple Linux kernel mailing
list about O_SYNC being improved, that I thought I'd try it.

The improvements were REALLY astounding, and I would like to know if other
Linux users see this performance increase, I mean, it is almost 8~10 times
faster than using fsync.

Furthermore, it seems to also have the added benefit of reducing the I/O
storm at checkpoints over a system running with fsync off.

I'm really serious about this, changing this one parameter had dramatic
results on performance. We should have a general call to users to test
this setting with their OS of choice. If not that, if we can be sure that
there are no cases where using O_SYNC is worse than fsync() or
fdatasync(), it should be considered as the default.




Re: fsync vs open_sync

From
Bruce Momjian
Date:
pgsql@mohawksoft.com wrote:
> Furthermore, it seems to also have the added benefit of reducing the I/O
> storm at checkpoints over a system running with fsync off.
> 
> I'm really serious about this, changing this one parameter had dramatic
> results on performance. We should have a general call to users to test
> this setting with their OS of choice. If not that, if we can be sure that
> there are no cases where using O_SYNC is worse than fsync() or
> fdatasync(), it should be considered as the default.

Agreed.  Have you looked at src/tools/fsync?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: fsync vs open_sync

From
Tom Lane
Date:
pgsql@mohawksoft.com writes:
> The improvements were REALLY astounding, and I would like to know if other
> Linux users see this performance increase, I mean, it is almost 8~10 times
> faster than using fsync.
> Furthermore, it seems to also have the added benefit of reducing the I/O
> storm at checkpoints over a system running with fsync off.

What size transactions are you using in your tests?

For a system with small transactions (not much more than 1 page worth of
WAL traffic per transaction) I'd be pretty surprised if there was any
real difference at all.  There certainly should not be any difference in
terms of the number of physical writes.  We have seen some platforms
where fsync() is inefficiently implemented and requires more kernel
overhead than is reasonable --- not for I/O, but just to look through
the kernel buffers and confirm that none of them need flushing.  But I
didn't think Linux was one of these.
        regards, tom lane


Re: fsync vs open_sync

From
Mark Kirkwood
Date:
Just out of interest, what happens to the difference if you use *ext3*  
(perhaps with data=writeback)

regards

Mark
pgsql@mohawksoft.com wrote:

>I did a little test on the various options of fsync.
>...
>create table testndx (value integer, name varchar);
>create index testndx_val on testndx (value);
>
>for(int i=0; i < 1000000; i++)
>{
>  printf_query( "insert into testndx (value, name) values ('%d', 'test')",
>random());
>
>   // report here
>}
>
>
>Anyway, with fsync enabled using standard fsync(), I get roughly 300-400
>inserts per second. With fsync disabled, I get about 7000 inserts per
>second. When I re-enable fsync but use the open_sync option, I can get
>about 2500 inserts per second.
>
>(This is on Linux 2.4 kernel, ext2 file system)
>
>
>  
>


Re: fsync vs open_sync

From
pgsql@mohawksoft.com
Date:
> Just out of interest, what happens to the difference if you use *ext3*
> (perhaps with data=writeback)

Actually, I was working for a client, so it wasn't a general exploritory,
but I can say that early on we discovered that ext3 was about the worst
file system for PostgreSQL. We gave up on it and decided to use ext2.

I have been considering a full sweep in my test lab off client time later on.

ext2, ext3, jfs, xfs, and ReiserFS, fsync on with fdatasync or open_sync,
and fsync off.

One million inserts with auto commit.





Re: fsync vs open_sync

From
Doug McNaught
Date:
pgsql@mohawksoft.com writes:

>> Just out of interest, what happens to the difference if you use *ext3*
>> (perhaps with data=writeback)
>
> Actually, I was working for a client, so it wasn't a general exploritory,
> but I can say that early on we discovered that ext3 was about the worst
> file system for PostgreSQL. We gave up on it and decided to use ext2.

I'd be interested in which ext3 mount options you used--I can see how
anything other than 'data=writeback' could be a performance killer.
I've been meaning to run a few tests myself, but haven't had the
time...

-Doug
-- 
Let us cross over the river, and rest under the shade of the trees.  --T. J. Jackson, 1863


Re: fsync vs open_sync

From
Manfred Spraul
Date:
Tom Lane wrote:

>pgsql@mohawksoft.com writes:
>  
>
>>The improvements were REALLY astounding, and I would like to know if other
>>Linux users see this performance increase, I mean, it is almost 8~10 times
>>faster than using fsync.
>>Furthermore, it seems to also have the added benefit of reducing the I/O
>>storm at checkpoints over a system running with fsync off.
>>    
>>
>
>What size transactions are you using in your tests?
>
>For a system with small transactions (not much more than 1 page worth of
>WAL traffic per transaction) I'd be pretty surprised if there was any
>real difference at all.  There certainly should not be any difference in
>terms of the number of physical writes.  We have seen some platforms
>where fsync() is inefficiently implemented and requires more kernel
>overhead than is reasonable --- not for I/O, but just to look through
>the kernel buffers and confirm that none of them need flushing.  But I
>didn't think Linux was one of these.
>
>  
>
IDE or scsi? If IDE: Write cache on or off? Which 2.4 kernel?
The numbers are very high - it could be a side effect of write caching 
by the disks. I think some Suse 2.4 kernels have partial support for 
reliable fsync even if the write cache is on (i.e. fsync issues a cache 
flush command to the disk), but not all code paths are handled. Perhaps 
fsync is handled and O_SYNC is not handled.
I could try to find the details.

--   Manfred


Re: fsync vs open_sync (more info)

From
pgsql@mohawksoft.com
Date:
Some more information:

I started to perform the tests on one of the machines in my lab, and guess
what, almost no difference between fsync and open_sync. Either on jfs or
ext2.

The difference, Linux 2.6.3? My original tests where on Linux 2.4.25.

The good part is that open_sync wasn't worse.

Just a question about conceptually, "What is the right thing to do?" I
started to think about this. To me, the O_SYNC flag is to ensure that what
you write, at the time of write, is on the disk. In SQL terms it is like
"auto commit." Calling fsync or fdatasync is so that one can batch write
calls and flush it out to disk in one shot, conceptually, it is like
transaction.

Does it make sense, then, to say that WAL O_SYNC should be O_SYNC? If
there are no reasons not too, doesn't it make sense to make this the
default. It will give a boost for any 2.4 Linux machines and won't seem to
hurt anyone else.



Re: fsync vs open_sync (more info)

From
pgsql@mohawksoft.com
Date:
>
> In particular, you need to offer some evidence for that completely
> undocumented assertion that "it won't hurt anyone else".

It should be easy enough to prove whether or not O_SYNC hurts anyone.

OK, let me ask a few questions:

(1) what is a good sample set on which to run? Linux, FreeBSD, MacIntosh?
(2) What sort of tests would be definitive? Auto commit and some
transactional load?


After delving into this a little, it seems to me that if you are going to
do this:

write(file, buffer, size);
f[data]sync(file);

Opening with O_SYNC seems to be an optimization specifically to this
methodology. At the very least, it will save one user/kernel transition.
If we can prove beyond a reasonable doubt that using O_SYNC does not hurt
any platform, then what reason would there be to continue making it the
default?

Again, conceptually, O_SYNC does what you want it to do, and should be
able to do it more efficiently than fdatasync().


Re: fsync vs open_sync (more info)

From
Tom Lane
Date:
pgsql@mohawksoft.com writes:
> Does it make sense, then, to say that WAL O_SYNC should be O_SYNC? If
> there are no reasons not too, doesn't it make sense to make this the
> default. It will give a boost for any 2.4 Linux machines and won't seem to
> hurt anyone else.

You have got the terms of debate backwards here.  These decisions were
already made once, on the basis of more testing than you have done
(okay, it wasn't months worth of work, but we at least exercised a
number of scenarios on a number of platforms).  The question is not "why
shouldn't we make this the default" but "why should we make this the
default, and what are we likely to break if we do so?"  Showing that one
release series of one platform wins in one particular set of tests is
not sufficient grounds for changing the default.

In particular, you need to offer some evidence for that completely
undocumented assertion that "it won't hurt anyone else".
        regards, tom lane


Re: fsync vs open_sync (more info)

From
pgsql@mohawksoft.com
Date:
> On Tue, 2004-08-10 at 07:48, pgsql@mohawksoft.com wrote:
>> Some more information:
>>
>> I started to perform the tests on one of the machines in my lab, and
>> guess
>> what, almost no difference between fsync and open_sync. Either on jfs or
>> ext2.
>>
>> The difference, Linux 2.6.3? My original tests where on Linux 2.4.25.
> Very hazy memory recalls something about O_SYNC not really doing
> anything in early kernel versions.
>
>>
>> The good part is that open_sync wasn't worse.

In early Linux kernels, O_SYNC was implemented using fsync(), and there
was an amount of debate about people using "O_SYNC" should see performance
degradation.

>>
>> Just a question about conceptually, "What is the right thing to do?" I
>> started to think about this. To me, the O_SYNC flag is to ensure that
>> what
>> you write, at the time of write, is on the disk. In SQL terms it is like
>> "auto commit." Calling fsync or fdatasync is so that one can batch write
>> calls and flush it out to disk in one shot, conceptually, it is like
>> transaction.
> With the caveat that the kernel can start flushing your data to disk
> in the background, not just when you call fdatasync/fsync.

I was speaking in conceptual terms, not exact ones. Just a general analogy.

In theory, theory and practice are the same thing, in practice, they are not.


Re: fsync vs open_sync

From
Manfred Spraul
Date:
pgsql@mohawksoft.com wrote:

>I have been considering a full sweep in my test lab off client time later on.
>
>ext2, ext3, jfs, xfs, and ReiserFS, fsync on with fdatasync or open_sync,
>and fsync off.
>  
>
Before you start: double check that the disks are not lying:
At least the suse 2.4 kernel send cache flush commands to ide disks on 
fsync(), but not with O_SYNC:

http://marc.theaimsgroup.com/?l=linux-kernel&m=107964507113585


--   Manfred


Re: fsync vs open_sync (more info)

From
Tom Lane
Date:
pgsql@mohawksoft.com writes:
> After delving into this a little, it seems to me that if you are going to
> do this:

> write(file, buffer, size);
> f[data]sync(file);

> Opening with O_SYNC seems to be an optimization specifically to this
> methodology.

What you are missing is that we don't necessarily do that.  Writes and
flushes of xlog don't always occur together: we may write out a buffer
to make room in shared memory even though we do not yet need it flushed
to disk.  In this situation it is better *not* to have O_SYNC on because
we don't need to force (and wait for) a write just then.  With a little
luck the kernel will write the buffer before we actually need a flush
to occur, and so there will be no actual delaying for it at all.

In particular this scenario applies for bulk-update transactions that
create vast amounts of WAL traffic but don't need an fsync till the very
end.
        regards, tom lane