Thread: Can we trust fsync?

Can we trust fsync?

From
Craig Ringer
Date:
I'm really concerned by this post on Linux's fsync and disk flush behaviour:

http://milek.blogspot.com.au/2010/12/linux-osync-and-write-barriers.html

and seeking opinions from folks here who've been deeply involved in
write reliability work.

The amount of change in write reliablity behaviour in Linux across
kernel versions, file systems and storage abstraction layers is worrying
- different results for LVM vs !LVM, md vs !md, ext3 vs other, etc.

If this isn't something that's already been seen and dealt with then
I'll see if I can take a look into it once the RLS work is dealt with.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Can we trust fsync?

From
Craig Ringer
Date:
On 11/21/2013 07:45 AM, Craig Ringer wrote:
> I'm really concerned by this post on Linux's fsync and disk flush behaviour:
> 
> http://milek.blogspot.com.au/2010/12/linux-osync-and-write-barriers.html

... and yes, I realise that's partly why we have the "fsync" param to
control different sync modes. Just concerned it's even more variable
than I thought.


-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Can we trust fsync?

From
Tatsuo Ishii
Date:
> On 11/21/2013 07:45 AM, Craig Ringer wrote:
>> I'm really concerned by this post on Linux's fsync and disk flush behaviour:
>> 
>> http://milek.blogspot.com.au/2010/12/linux-osync-and-write-barriers.html
> 
> ... and yes, I realise that's partly why we have the "fsync" param to
> control different sync modes. Just concerned it's even more variable
> than I thought.

So on linux, we don't have any safe option for wal_sync_method?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp



Re: Can we trust fsync?

From
Tom Lane
Date:
Craig Ringer <craig@2ndquadrant.com> writes:
> The amount of change in write reliablity behaviour in Linux across
> kernel versions, file systems and storage abstraction layers is worrying
> - different results for LVM vs !LVM, md vs !md, ext3 vs other, etc.

Well, we pretty much *have to* trust fsync --- there's not a lot we can
do if the kernel doesn't get this right.  My takeaway is that you don't
want to be running a production database on bleeding-edge kernels or
filesystem stacks.  If you want to use Linux, use a distro from a vendor
with a track record for caring about stability.  (I'll omit the commercial
for my former employers, but ...)

Also, it's not that hard to do plug-pull testing to verify that your
system is telling the truth about fsync.  This really ought to be part
of acceptance testing for any new DB server.
        regards, tom lane



Re: Can we trust fsync?

From
"Joshua D. Drake"
Date:
On 11/20/2013 03:45 PM, Craig Ringer wrote:
>
> I'm really concerned by this post on Linux's fsync and disk flush behaviour:
>
> http://milek.blogspot.com.au/2010/12/linux-osync-and-write-barriers.html
>
> and seeking opinions from folks here who've been deeply involved in
> write reliability work.
>
> The amount of change in write reliablity behaviour in Linux across
> kernel versions, file systems and storage abstraction layers is worrying
> - different results for LVM vs !LVM, md vs !md, ext3 vs other, etc.
>
> If this isn't something that's already been seen and dealt with then
> I'll see if I can take a look into it once the RLS work is dealt with.
>

I thought Greg did some testing on this a while back and determined 
which versions were safe... (/me looks for post)

JD

-- 
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
For my dreams of your image that blossoms   a rose in the deeps of my heart. - W.B. Yeats



Re: Can we trust fsync?

From
Florian Weimer
Date:
On 11/21/2013 12:45 AM, Craig Ringer wrote:
> I'm really concerned by this post on Linux's fsync and disk flush behaviour:
>
> http://milek.blogspot.com.au/2010/12/linux-osync-and-write-barriers.html
>
> and seeking opinions from folks here who've been deeply involved in
> write reliability work.

With ext4 and XFS on plain/LVM/md block devices, this issue should 
really be a thing of the past.  I think the kernel folks would treat 
this as bugs nowadays, too.

-- 
Florian Weimer / Red Hat Product Security Team



Re: Can we trust fsync?

From
Greg Stark
Date:
<div dir="ltr"><div class="gmail_extra"><br /><div class="gmail_quote">On Thu, Nov 21, 2013 at 1:43 AM, Tom Lane <span
dir="ltr"><<ahref="mailto:tgl@sss.pgh.pa.us" target="_blank">tgl@sss.pgh.pa.us</a>></span> wrote:<br
/><blockquoteclass="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div id=":3r8"
style="overflow:hidden">Also, it's not that hard to do plug-pull testing to verify that your<br /> system is telling
thetruth about fsync.  This really ought to be part<br /> of acceptance testing for any new DB
server.</div></blockquote></div><br/></div><div class="gmail_extra">I've never tried it but I always wondered how easy
itwas to do. How would you ever know you had tested it enough?<br /><br /><br /></div><div class="gmail_extra">The
originalmail was referencing a problem with syncing *meta* data though. The semantics around meta data syncs are much
lessclearly specified, in part because file systems traditionally made nearly all meta data operations synchronous.
Doingplug-pull testing on Postgres would not test meta data syncing very well since Postgres specifically avoids doing
muchmeta data operations by overwriting existing files and blocks as much as possible. You would have to test doing
tableextensions or pulling the plug immediately after switching xlog files repeatedly to have any coverage at all
there.<br/></div><div class="gmail_extra"><br clear="all" /><br />-- <br />greg<br /></div></div> 

Re: Can we trust fsync?

From
Tom Lane
Date:
Greg Stark <stark@mit.edu> writes:
> On Thu, Nov 21, 2013 at 1:43 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Also, it's not that hard to do plug-pull testing to verify that your
>> system is telling the truth about fsync.  This really ought to be part
>> of acceptance testing for any new DB server.

> I've never tried it but I always wondered how easy it was to do. How would
> you ever know you had tested it enough?

I used the program Greg Smith recommends on our wiki (can't remember the
name offhand) when I got a new house server this spring.  With the RAID
card configured for writethrough and no battery, it failed all over the
place.  Fixed those configuration bugs, it was okay three or four times
in a row, which was good enough for me.

> The original mail was referencing a problem with syncing *meta* data
> though. The semantics around meta data syncs are much less clearly
> specified, in part because file systems traditionally made nearly all meta
> data operations synchronous. Doing plug-pull testing on Postgres would not
> test meta data syncing very well since Postgres specifically avoids doing
> much meta data operations by overwriting existing files and blocks as much
> as possible.

True.  You're better off with a specialized testing program.  (Though
now you mention it, I wonder whether that program was stressing metadata
or not.)
        regards, tom lane



Re: Can we trust fsync?

From
Claudio Freire
Date:
On Fri, Nov 22, 2013 at 1:16 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The original mail was referencing a problem with syncing *meta* data
>> though. The semantics around meta data syncs are much less clearly
>> specified, in part because file systems traditionally made nearly all meta
>> data operations synchronous. Doing plug-pull testing on Postgres would not
>> test meta data syncing very well since Postgres specifically avoids doing
>> much meta data operations by overwriting existing files and blocks as much
>> as possible.
>
> True.  You're better off with a specialized testing program.  (Though
> now you mention it, I wonder whether that program was stressing metadata
> or not.)


You can always stress metadata by leaving atime updates in their full
setting (whatever it is for that filesystem).



Re: Can we trust fsync?

From
Bruce Momjian
Date:
On Fri, Nov 22, 2013 at 11:16:06AM -0500, Tom Lane wrote:
> Greg Stark <stark@mit.edu> writes:
> > On Thu, Nov 21, 2013 at 1:43 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Also, it's not that hard to do plug-pull testing to verify that your
> >> system is telling the truth about fsync.  This really ought to be part
> >> of acceptance testing for any new DB server.
> 
> > I've never tried it but I always wondered how easy it was to do. How would
> > you ever know you had tested it enough?
> 
> I used the program Greg Smith recommends on our wiki (can't remember the
> name offhand) when I got a new house server this spring.  With the RAID
> card configured for writethrough and no battery, it failed all over the
> place.  Fixed those configuration bugs, it was okay three or four times
> in a row, which was good enough for me.
> 
> > The original mail was referencing a problem with syncing *meta* data
> > though. The semantics around meta data syncs are much less clearly
> > specified, in part because file systems traditionally made nearly all meta
> > data operations synchronous. Doing plug-pull testing on Postgres would not
> > test meta data syncing very well since Postgres specifically avoids doing
> > much meta data operations by overwriting existing files and blocks as much
> > as possible.
> 
> True.  You're better off with a specialized testing program.  (Though
> now you mention it, I wonder whether that program was stressing metadata
> or not.)

The program is diskchecker:
http://brad.livejournal.com/2116715.html

I got the author to re-host the source code on github a few years ago.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: Can we trust fsync?

From
Peter Geoghegan
Date:
On Fri, Nov 22, 2013 at 2:57 PM, Bruce Momjian <bruce@momjian.us> wrote:
> The program is diskchecker:
>
>         http://brad.livejournal.com/2116715.html
>
> I got the author to re-host the source code on github a few years ago.

It might be worth re-implementing this for -contrib. The fact that we
mention diskchecker.pl in the docs, and it is a pretty obscure Perl
script on some guy's personal website doesn't inspire much confidence.

-- 
Peter Geoghegan



Re: Can we trust fsync?

From
Bruce Momjian
Date:
On Fri, Nov 22, 2013 at 03:06:31PM -0800, Peter Geoghegan wrote:
> On Fri, Nov 22, 2013 at 2:57 PM, Bruce Momjian <bruce@momjian.us> wrote:
> > The program is diskchecker:
> >
> >         http://brad.livejournal.com/2116715.html
> >
> > I got the author to re-host the source code on github a few years ago.
> 
> It might be worth re-implementing this for -contrib. The fact that we
> mention diskchecker.pl in the docs, and it is a pretty obscure Perl
> script on some guy's personal website doesn't inspire much confidence.

Well, it was his idea, and quite a good one.  I guess we could
reimplement this in C if someone wants to do the legwork.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: Can we trust fsync?

From
Josh Berkus
Date:
On 11/22/2013 03:23 PM, Bruce Momjian wrote:
> On Fri, Nov 22, 2013 at 03:06:31PM -0800, Peter Geoghegan wrote:
>> On Fri, Nov 22, 2013 at 2:57 PM, Bruce Momjian <bruce@momjian.us> wrote:
>>> The program is diskchecker:
>>>
>>>         http://brad.livejournal.com/2116715.html
>>>
>>> I got the author to re-host the source code on github a few years ago.
>>
>> It might be worth re-implementing this for -contrib. The fact that we
>> mention diskchecker.pl in the docs, and it is a pretty obscure Perl
>> script on some guy's personal website doesn't inspire much confidence.
> 
> Well, it was his idea, and quite a good one.  I guess we could
> reimplement this in C if someone wants to do the legwork.

Yeah, too bad Brad didn't post a license for it.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: Can we trust fsync?

From
Bruce Momjian
Date:
On Fri, Nov 22, 2013 at 03:27:29PM -0800, Josh Berkus wrote:
> On 11/22/2013 03:23 PM, Bruce Momjian wrote:
> > On Fri, Nov 22, 2013 at 03:06:31PM -0800, Peter Geoghegan wrote:
> >> On Fri, Nov 22, 2013 at 2:57 PM, Bruce Momjian <bruce@momjian.us> wrote:
> >>> The program is diskchecker:
> >>>
> >>>         http://brad.livejournal.com/2116715.html
> >>>
> >>> I got the author to re-host the source code on github a few years ago.
> >>
> >> It might be worth re-implementing this for -contrib. The fact that we
> >> mention diskchecker.pl in the docs, and it is a pretty obscure Perl
> >> script on some guy's personal website doesn't inspire much confidence.
> > 
> > Well, it was his idea, and quite a good one.  I guess we could
> > reimplement this in C if someone wants to do the legwork.
> 
> Yeah, too bad Brad didn't post a license for it.

We can ask him.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: Can we trust fsync?

From
Michael Paquier
Date:
On Sat, Nov 23, 2013 at 8:06 AM, Peter Geoghegan <pg@heroku.com> wrote:
> On Fri, Nov 22, 2013 at 2:57 PM, Bruce Momjian <bruce@momjian.us> wrote:
>> The program is diskchecker:
>>
>>         http://brad.livejournal.com/2116715.html
>>
>> I got the author to re-host the source code on github a few years ago.
>
> It might be worth re-implementing this for -contrib. The fact that we
> mention diskchecker.pl in the docs, and it is a pretty obscure Perl
> script on some guy's personal website doesn't inspire much confidence.
Yes, having that in contrib would be useful. Those would bring a plus
when testing disks for Postgres.
-- 
Michael