Thread: 9.6 and fsync=off

9.6 and fsync=off

From
Craig Ringer
Date:
Hi all

After helping clean up the mess from another user who turned fsync=off because they read (bad) tuning advice that it was faster, I'd really like to change the config file comment.

Really.

#fsync = on                             # turns forced synchronization on or off

Now, we can't rename fsync to disable_crash_safety=on or corrupt_my_database=on. But the comment needs changing.

How about:

#fsync = on                             # force disk flushes required for crash safety

or, preferably something like:

"Enable forced disk flushes when they are required for crash safety. Disabling fsync can lead to unrecoverable database corruption in a crash of the host system."

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: 9.6 and fsync=off

From
Abhijit Menon-Sen
Date:
At 2016-04-27 17:58:08 +0800, craig@2ndquadrant.com wrote:
>
> #fsync = on                             # turns forced synchronization on or off

I suggest:                                # provide crash safety by flushing disk writes
        # (Disabling this can lead to unrecoverable data                                         # loss if the system
crashes.)

-- Abhijit



Re: 9.6 and fsync=off

From
Magnus Hagander
Date:


On Wed, Apr 27, 2016 at 12:43 PM, Abhijit Menon-Sen <ams@2ndquadrant.com> wrote:
At 2016-04-27 17:58:08 +0800, craig@2ndquadrant.com wrote:
>
> #fsync = on                             # turns forced synchronization on or off

I suggest:                                # provide crash safety by flushing disk writes
                                          # (Disabling this can lead to unrecoverable data
                                          # loss if the system crashes.)

+1 for the change. I suggest shortening it to just "disabling this can lead to unrecoverable data corruption" (I think corruption is better than loss, mainly because too many people equate loss with "i may loose my last 10 updates, and I'm fine with that).

--

Re: 9.6 and fsync=off

From
Petr Jelinek
Date:
On 27/04/16 12:53, Magnus Hagander wrote:
>
>
> On Wed, Apr 27, 2016 at 12:43 PM, Abhijit Menon-Sen <ams@2ndquadrant.com
> <mailto:ams@2ndquadrant.com>> wrote:
>
>     At 2016-04-27 17:58:08 +0800, craig@2ndquadrant.com
>     <mailto:craig@2ndquadrant.com> wrote:
>     >
>     > #fsync = on                             # turns forced synchronization on or off
>
>     I suggest:                                # provide crash safety by
>     flushing disk writes
>                                                # (Disabling this can
>     lead to unrecoverable data
>                                                # loss if the system
>     crashes.)
>
>
> +1 for the change. I suggest shortening it to just "disabling this can
> lead to unrecoverable data corruption" (I think corruption is better
> than loss, mainly because too many people equate loss with "i may loose
> my last 10 updates, and I'm fine with that).
>

+1 (Abhijit's wording with data loss changed to data corruption)

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services



Re: 9.6 and fsync=off

From
Abhijit Menon-Sen
Date:
Here's a patch just to help things along.

-- Abhijit

Attachment

Re: 9.6 and fsync=off

From
Tom Lane
Date:
Petr Jelinek <petr@2ndquadrant.com> writes:
> +1 (Abhijit's wording with data loss changed to data corruption)

I'd suggest something like

#fsync = on                # flush data to disk for crash safety                # (turning this off can cause
    # unrecoverable data corruption!)
 
        regards, tom lane



Re: 9.6 and fsync=off

From
Craig Ringer
Date:
On 27 April 2016 at 21:44, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Petr Jelinek <petr@2ndquadrant.com> writes:
> +1 (Abhijit's wording with data loss changed to data corruption)

I'd suggest something like

#fsync = on                             # flush data to disk for crash safety
                                        # (turning this off can cause
                                        # unrecoverable data corruption!)


Looks good.

The docs on fsync are already good, it's just a matter of making people think twice and actually look at them. 

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: 9.6 and fsync=off

From
Robert Haas
Date:
On Wed, Apr 27, 2016 at 11:04 AM, Craig Ringer <craig@2ndquadrant.com>
wrote:>> I'd suggest something like
>>
>> #fsync = on                             # flush data to disk for crash
>> safety
>>                                         # (turning this off can cause
>>                                         # unrecoverable data corruption!)
>>
>
> Looks good.
>
> The docs on fsync are already good, it's just a matter of making people
> think twice and actually look at them.

Committed that way.  Thanks for suggesting this, Craig.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: 9.6 and fsync=off

From
Greg Stark
Date:
On Wed, Apr 27, 2016 at 10:58 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> Now, we can't rename fsync to disable_crash_safety=on or
> corrupt_my_database=on. But the comment needs changing.


Fwiw we've done similar things in the past. We can provide
backwards-compatibility support for "fsync" but make the setting
appear as "crash_safety" or whatever in pg_settings and in the default
postgres.conf. The only downside is that tools or scripts that
retrieve all the settings might break or miss that setting.

-- 
greg



Re: 9.6 and fsync=off

From
Simon Riggs
Date:
On 27 April 2016 at 17:04, Craig Ringer <craig@2ndquadrant.com> wrote:
On 27 April 2016 at 21:44, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Petr Jelinek <petr@2ndquadrant.com> writes:
> +1 (Abhijit's wording with data loss changed to data corruption)

I'd suggest something like

#fsync = on                             # flush data to disk for crash safety
                                        # (turning this off can cause
                                        # unrecoverable data corruption!)


Looks good.

The docs on fsync are already good, it's just a matter of making people think twice and actually look at them. 

If fsync=off and you turn it on, does it fsync anything at that point?

Or does it mean only that future fsyncs will occur?

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: 9.6 and fsync=off

From
"David G. Johnston"
Date:
On Thursday, April 28, 2016, Simon Riggs <simon@2ndquadrant.com> wrote:
On 27 April 2016 at 17:04, Craig Ringer <craig@2ndquadrant.com> wrote:
On 27 April 2016 at 21:44, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Petr Jelinek <petr@2ndquadrant.com> writes:
> +1 (Abhijit's wording with data loss changed to data corruption)

I'd suggest something like

#fsync = on                             # flush data to disk for crash safety
                                        # (turning this off can cause
                                        # unrecoverable data corruption!)


Looks good.

The docs on fsync are already good, it's just a matter of making people think twice and actually look at them. 

If fsync=off and you turn it on, does it fsync anything at that point?

Or does it mean only that future fsyncs will occur?


http://www.postgresql.org/docs/current/static/runtime-config-wal.html

4th paragraph in the fsync section.

David J.

Re: 9.6 and fsync=off

From
Simon Riggs
Date:
On 28 April 2016 at 22:30, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Thursday, April 28, 2016, Simon Riggs <simon@2ndquadrant.com> wrote:
On 27 April 2016 at 17:04, Craig Ringer <craig@2ndquadrant.com> wrote:
On 27 April 2016 at 21:44, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Petr Jelinek <petr@2ndquadrant.com> writes:
> +1 (Abhijit's wording with data loss changed to data corruption)

I'd suggest something like

#fsync = on                             # flush data to disk for crash safety
                                        # (turning this off can cause
                                        # unrecoverable data corruption!)


Looks good.

The docs on fsync are already good, it's just a matter of making people think twice and actually look at them. 

If fsync=off and you turn it on, does it fsync anything at that point?

Or does it mean only that future fsyncs will occur?


http://www.postgresql.org/docs/current/static/runtime-config-wal.html

4th paragraph in the fsync section.

Thanks. I've never touched that parameter!  But I could have read the docs. 

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: 9.6 and fsync=off

From
Andres Freund
Date:
On 2016-04-28 21:32:37 +0200, Simon Riggs wrote:
> On 27 April 2016 at 17:04, Craig Ringer <craig@2ndquadrant.com> wrote:
> 
> > On 27 April 2016 at 21:44, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> >> Petr Jelinek <petr@2ndquadrant.com> writes:
> >> > +1 (Abhijit's wording with data loss changed to data corruption)
> >>
> >> I'd suggest something like
> >>
> >> #fsync = on                             # flush data to disk for crash
> >> safety
> >>                                         # (turning this off can cause
> >>                                         # unrecoverable data corruption!)
> >>
> >>
> > Looks good.
> >
> > The docs on fsync are already good, it's just a matter of making people
> > think twice and actually look at them.
> >
> 
> If fsync=off and you turn it on, does it fsync anything at that point?
> 
> Or does it mean only that future fsyncs will occur?

Abhijit had a patch implementing automatically running fsync whenever
reenabled IIRC. Abhijit?

Andres



Re: 9.6 and fsync=off

From
Abhijit Menon-Sen
Date:
At 2016-04-28 13:44:23 -0700, andres@anarazel.de wrote:
>
> Abhijit had a patch implementing automatically running fsync whenever
> reenabled IIRC. Abhijit?

The patch I had written is attached, and it's not quite the same thing.
Here's how I originally described it in response to a question from
Robert:

    «In 20150115133245.GG5245@awork2.anarazel.de, Andres explained his
    rationale as follows:

        «What I am thinking of is that, currently, if you start the
        server for initial loading with fsync=off, and then restart it,
        you're open to data loss. So when the current config file
        setting is changed from off to on, we should fsync the data
        directory. Even if there was no crash restart.»

    That's what I tried to implement.»

I remember there was some subsequent discussion about it being better to
issue fsync during a checkpoint when we see that its value has changed,
but if I did any work on it (which I have a vague memory of), I can't
find it now. Sorry.

Do you want a patch along those lines now, or is it too late?

-- Abhijit

Attachment

Re: 9.6 and fsync=off

From
Tom Lane
Date:
Abhijit Menon-Sen <ams@2ndQuadrant.com> writes:
> Do you want a patch along those lines now, or is it too late?

We're certainly not going to consider fooling with this in 9.6.
The situation for manual fsync-twiddling is no worse than it was in
any prior release, and we are long past feature freeze.

If you want to put it on your to-do queue for 9.7, feel free.
        regards, tom lane



Re: 9.6 and fsync=off

From
Robert Haas
Date:
On Fri, Apr 29, 2016 at 9:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Abhijit Menon-Sen <ams@2ndQuadrant.com> writes:
>> Do you want a patch along those lines now, or is it too late?
>
> We're certainly not going to consider fooling with this in 9.6.
> The situation for manual fsync-twiddling is no worse than it was in
> any prior release, and we are long past feature freeze.
>
> If you want to put it on your to-do queue for 9.7, feel free.

Agreed.

I also think that it would be a swell idea to detect whether a system
has ever crashed with fsync=off, and do something about that, like
maybe bleat on every subsequent startup for the lifetime of the
cluster.  I think Andres may have even proposed a patch for this sort
of thing before, although I don't remember for sure and I think he and
I disagreed on the details.  Sketch:

- Keep a copy of the fsync status in pg_control.
- If we ever enter recovery while it's turned off, say:
WARNING: Entering recovery with fsync=off; this cluster may be
irretrievably corrupted.
...and also set a separate flag indicating that we've done at least
one recovery with fsync=off.
- If that flag is set on a subsequent startup, say:
WARNING: Recovery was previously performed with fsync=off; this
cluster may be irretrievably corrupted.

While I'm kvetching, it might also be a good idea to have a timestamp
in pg_control indicating the date and time at which pg_resetxlog was
last run (and maybe the cluster creation time, too).  I run across way
too many clusters where the customer can't convincingly vouch for the
proposition that nothing evil has been done, and having some forensic
evidence available would make it easier to figure out where the blame
lies.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: 9.6 and fsync=off

From
Craig Ringer
Date:
On 2 May 2016 at 22:07, Robert Haas <robertmhaas@gmail.com> wrote:
 

I also think that it would be a swell idea to detect whether a system
has ever crashed with fsync=off, and do something about that, like
maybe bleat on every subsequent startup for the lifetime of the
cluster.

Yes. Very, very yes.

That would've made my life considerably easier on a few occasions now.

It shouldn't take much more than a new pg_control field and a test during recovery.

Should TODO this, but since that's sometimes where ideas go to die, I'm going to see if I can hack this out soon as well.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: 9.6 and fsync=off

From
Tom Lane
Date:
Craig Ringer <craig@2ndquadrant.com> writes:
> On 2 May 2016 at 22:07, Robert Haas <robertmhaas@gmail.com> wrote:
>> I also think that it would be a swell idea to detect whether a system
>> has ever crashed with fsync=off, and do something about that, like
>> maybe bleat on every subsequent startup for the lifetime of the
>> cluster.

> Yes. Very, very yes.

+1 for tracking this in pg_control (maybe even with a counter, not
just a flag).  I'm less convinced that we need to bleat on every
subsequent startup though --- that seems like just nagging.
Having the info available from pg_controldata seems sufficient for
forensics.

The timestamp ideas aren't bad either.

BTW, how would this work in a standby server?
        regards, tom lane



Re: 9.6 and fsync=off

From
Andres Freund
Date:
Hi,

On 2016-05-02 10:07:50 -0400, Robert Haas wrote:
> I also think that it would be a swell idea to detect whether a system
> has ever crashed with fsync=off, and do something about that, like
> maybe bleat on every subsequent startup for the lifetime of the
> cluster.  I think Andres may have even proposed a patch for this sort
> of thing before, although I don't remember for sure and I think he and
> I disagreed on the details.  Sketch:

Hm,  I can't remember doing that.


> - Keep a copy of the fsync status in pg_control.
> - If we ever enter recovery while it's turned off, say:
> WARNING: Entering recovery with fsync=off; this cluster may be
> irretrievably corrupted.
> ...and also set a separate flag indicating that we've done at least
> one recovery with fsync=off.
> - If that flag is set on a subsequent startup, say:
> WARNING: Recovery was previously performed with fsync=off; this
> cluster may be irretrievably corrupted.

Well, the problem with that is that postgres crashes are actually
harmless with regard to fsync=on/off. It's just OS crashes that are a
problem. So it seems quite likely that the false-positive rate here
would be high enough, to make people ignore it.

Andres



Re: 9.6 and fsync=off

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2016-05-02 10:07:50 -0400, Robert Haas wrote:
>> - If that flag is set on a subsequent startup, say:
>> WARNING: Recovery was previously performed with fsync=off; this
>> cluster may be irretrievably corrupted.

> Well, the problem with that is that postgres crashes are actually
> harmless with regard to fsync=on/off. It's just OS crashes that are a
> problem. So it seems quite likely that the false-positive rate here
> would be high enough, to make people ignore it.

That's a pretty good point.  Also, as sketched, I believe this would
start bleating after a crash recovery performed because a backend
died --- which is a case where we know for certain there was no OS
crash.  So this idea needs some more thought.
        regards, tom lane



Re: 9.6 and fsync=off

From
Robert Haas
Date:
On Mon, May 2, 2016 at 12:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andres Freund <andres@anarazel.de> writes:
>> On 2016-05-02 10:07:50 -0400, Robert Haas wrote:
>>> - If that flag is set on a subsequent startup, say:
>>> WARNING: Recovery was previously performed with fsync=off; this
>>> cluster may be irretrievably corrupted.
>
>> Well, the problem with that is that postgres crashes are actually
>> harmless with regard to fsync=on/off. It's just OS crashes that are a
>> problem. So it seems quite likely that the false-positive rate here
>> would be high enough, to make people ignore it.
>
> That's a pretty good point.  Also, as sketched, I believe this would
> start bleating after a crash recovery performed because a backend
> died --- which is a case where we know for certain there was no OS
> crash.  So this idea needs some more thought.

That's true.  I think, that we could arrange to ignore postmaster
initiated crash-and-restart cycles in deciding whether to set the
flag.  Now, somebody could still do an immediate shutdown, or the
postmaster could go boom, but I don't think those are common enough
scenarios to justify not tracking this.  If you are using fsync=off
and running an immediate shutdown and then setting fsync=on and
restarting the server ... yeah, that could hypothetically be safe.
But I think you are playing with fire.  If you are using fsync=off for
the initial data load, it's not too much to ask that you shut the
cluster down cleanly when you are done.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company