Thread: fsync, fdatasync, open_sync, and open_datasync, -- Linux insanity

fsync, fdatasync, open_sync, and open_datasync, -- Linux insanity

From
pgsql@mohawksoft.com
Date:
Maybe I'm losing it, but I forced to apologize for trying to push for
"open_sync" as the default.

I have spent the last few days trying to come to some solid, documented
and verifyable, conclusions about which is the best fsync method. At a
minimum, what is the best fsync method for Linux. I can't find one.

On some systems fsync and fdatasync perform the same and open_sync is a
clear winner(RedHat 9, 2.4.25), on another system (Mandrake 10.0
community, also 2.4.25) fdatasync is clearly better than fsync and
performs the same as open_sync. (Using the same file system, ext2) I'm
also pretty sure it will be different on different hardware, and
open_datasync doesn't even work.

The problem that I have with this is that we are not talking about a
couple percentage points of performance that can be ignored for
simplicity, sometimes it is twice the performance, sometimes it is almost
10 times the performance, and there is no clear answer: "If you use X, you
should set Y."

I know for me, I didn't actually think the wal_sync_method would make much
difference until I tried all of them. (I bet most people will never adjust
the default) Maybe it is just a documentation issue. Maybe a short
discussion about various operating systems making different choices, and
how the administrator should try each for their application as there is
*no* definitive answer and that it could make a huge difference. I don't
know. It is just disturbing that it can make such a HUGE difference and it
is hardly discussed anywhere.

What would be a good strategy for addressing this issue? Is it an issue at
all? Is it simply a documentation issue? Do we craft some sort of test
that can characterize the behavior? What would that test need to do?

For what its worth, using open_sync, on a specific clients hardware and
OS, made the difference between running with fsync "on" and "off,"
choosing "on." Had we not tryied it, performance would have been too slow
to enable it.



Re: fsync, fdatasync, open_sync, and open_datasync, -- Linux insanity

From
Tom Lane
Date:
pgsql@mohawksoft.com writes:
> What would be a good strategy for addressing this issue? Is it an issue at
> all? Is it simply a documentation issue? Do we craft some sort of test
> that can characterize the behavior? What would that test need to do?

It seems to me that it's a documentation issue.  Maybe we could add a
section to the "Performance Tips" chapter advising that people
experiment with the different settings.

I don't think any test that we could build would be as useful as simply
trying the different settings with an installation's real workload.
        regards, tom lane


Re: fsync, fdatasync, open_sync, and open_datasync, --

From
Andreas Pflug
Date:
Tom Lane wrote:
> Andreas Pflug <pgadmin@pse-consulting.de> writes:
> 
>>Tom Lane wrote:
>>
>>>I don't think any test that we could build would be as useful as simply
>>>trying the different settings with an installation's real workload.
> 
> 
>>Benchmarking the real workload isn't always so easy, and might be quite 
>>time consuming to obtain meaningful values.
> 
> 
> The concern was about whether people might be missing an easy speedup of
> 2x or more.  I don't think it'd be that hard to tell ;-) if one setting
> is an order of magnitude better than another for your workload.  If
> there's not an obvious difference then you haven't wasted much effort
> checking.

This is probably more obvious with a 100 % write test app, compared to 
5-10 % write as in average apps. Those 90% reading will make your 
benchmarking unreliable unless you have it running for a longer period 
to get a better statistic. Improving signal/noise ratio (i.e. avoiding 
reads) makes it simpler.

Regards,
Andreas


Re: fsync, fdatasync, open_sync, and open_datasync, -- Linux insanity

From
Tom Lane
Date:
Andreas Pflug <pgadmin@pse-consulting.de> writes:
> Tom Lane wrote:
>> I don't think any test that we could build would be as useful as simply
>> trying the different settings with an installation's real workload.

> Benchmarking the real workload isn't always so easy, and might be quite 
> time consuming to obtain meaningful values.

The concern was about whether people might be missing an easy speedup of
2x or more.  I don't think it'd be that hard to tell ;-) if one setting
is an order of magnitude better than another for your workload.  If
there's not an obvious difference then you haven't wasted much effort
checking.
        regards, tom lane


Re: fsync, fdatasync, open_sync, and open_datasync, --

From
Andreas Pflug
Date:
Tom Lane wrote:

> 
> I don't think any test that we could build would be as useful as simply
> trying the different settings with an installation's real workload.

Benchmarking the real workload isn't always so easy, and might be quite 
time consuming to obtain meaningful values. Don't you think that some 
test app doing heavy inserts (maybe with multiple processes) would be 
sufficient to decide which option is best on that particular machine?

Regards,
Andreas