Thread: Re: [BUGS] BUG #13350: blindly fsyncing data dir considered harmful

Re: [BUGS] BUG #13350: blindly fsyncing data dir considered harmful

From
Stephen Frost
Date:
All,

(sending to -core to request a release)

* andrew@tao11.riddles.org.uk (andrew@tao11.riddles.org.uk) wrote:
> Operating system:   Debian (and probably others)

> The addition of a recursive fsync of the data dir on startup (in the absence
> of a clean shutdown) causes startup to fail if the data dir contains
> symlinks to files which the postgres user can't write to.
>
> This is the standard configuration for many SSL-enabled setups, including
> the standard debian packaging defaults. Accordingly, crash recovery now
> ALWAYS fails on such systems without manual intervention.

Andrew did a great job summarizing the problem, don't know that there's
much to add there.

This was back-patched all the way and released with the latest round of
minor releases, and given that it means crash recovery fails for a large
number of deployed systems, I think we need to fix (or revert) the
recursive fsync change (d8ac77ab178ddb2ae043b8c463cd30c031e793d0 and
related) and do new releases very shortly.
Thanks!
    Stephen

Re: [CORE] [BUGS] BUG #13350: blindly fsyncing data dir considered harmful

From
Magnus Hagander
Date:
<p dir="ltr"><br /> On May 25, 2015 5:12 PM, "Stephen Frost" <<a
href="mailto:sfrost@snowman.net">sfrost@snowman.net</a>>wrote:<br /> ><br /> > All,<br /> ><br /> >
(sendingto -core to request a release)<br /> ><br /> > * <a
href="mailto:andrew@tao11.riddles.org.uk">andrew@tao11.riddles.org.uk</a>(<a
href="mailto:andrew@tao11.riddles.org.uk">andrew@tao11.riddles.org.uk</a>)wrote:<br /> > > Operating system: 
 Debian(and probably others)<br /> ><br /> > > The addition of a recursive fsync of the data dir on startup
(inthe absence<br /> > > of a clean shutdown) causes startup to fail if the data dir contains<br /> > >
symlinksto files which the postgres user can't write to.<br /> > ><br /> > > This is the standard
configurationfor many SSL-enabled setups, including<br /> > > the standard debian packaging defaults.
Accordingly,crash recovery now<br /> > > ALWAYS fails on such systems without manual intervention.<br /> ><br
/>> Andrew did a great job summarizing the problem, don't know that there's<br /> > much to add there.<br />
><br/> > This was back-patched all the way and released with the latest round of<br /> > minor releases, and
giventhat it means crash recovery fails for a large<br /> > number of deployed systems, I think we need to fix (or
revert)the<br /> > recursive fsync change (d8ac77ab178ddb2ae043b8c463cd30c031e793d0 and<br /> > related) and do
newreleases very shortly.<br /> ><p dir="ltr">Agreed, this is a pretty bad regression and we need to at least do
somethingand out out a release asap - either revert or if we can find a better way (see the other thread about this
issuefor some other ideas). <p dir="ltr">It happens to be the default shipment on Debian and Ubuntu but it's definitely
nota platform specific problem I believe, so we should put out a "real" release and not expect packagers to carry a
specificpatch for it. <p dir="ltr">/Magnus  

Re: [CORE] [BUGS] BUG #13350: blindly fsyncing data dir considered harmful

From
Stephen Frost
Date:
* Magnus Hagander (magnus@hagander.net) wrote:
> On May 25, 2015 5:12 PM, "Stephen Frost" <sfrost@snowman.net> wrote:
> > This was back-patched all the way and released with the latest round of
> > minor releases, and given that it means crash recovery fails for a large
> > number of deployed systems, I think we need to fix (or revert) the
> > recursive fsync change (d8ac77ab178ddb2ae043b8c463cd30c031e793d0 and
> > related) and do new releases very shortly.
>
> Agreed, this is a pretty bad regression and we need to at least do
> something and out out a release asap - either revert or if we can find a
> better way (see the other thread about this issue for some other ideas).
>
> It happens to be the default shipment on Debian and Ubuntu but it's
> definitely not a platform specific problem I believe, so we should put out
> a "real" release and not expect packagers to carry a specific patch for it.

Agreed, there are certainly other reasons why a file might exist which
can't be written to by the postgres user, we really can't have crash
recovery fail because of it.

Further, I believe that a lot of the .deb-based distributions use
the same technique of using symlinks (Ubuntu included, but even those
who aren't downstream of Debian), and they would all have to be updated
with such a patch.  Sadly, suggesting to stay on a prior version (eg:
9.4.1) really isn't acceptable either, given the corruption risks
which 9.4.2 addressed.
Thanks!
    Stephen

What exactly is failing?

Is it that fsync is returning -1 ? Should we just ignore errors from
fsync if it happens in this stage? That might be safer than
determining which files should or shouldn't be fsynced.

That would also have an impact on people starting up on a flaky file
system perhaps. I'm imagining either a database mounted on a
filesystem with corruption trying to extract what they can or perhaps
something like an NFS filesystem with dangling .nfs files. On the one
hand limping along as best we can is the general Postgres strategy
when the filesystem starts failing but on the other hand we have had
circumstances in the past when users had a database on a network
filesystem that wasn't really ready for use yet when Postgres tried to
start up and starting up anyways didn't do them any favours.



Re: [CORE] [BUGS] BUG #13350: blindly fsyncing data dir considered harmful

From
Bruce Momjian
Date:
On Mon, May 25, 2015 at 04:37:59PM +0100, Greg Stark wrote:
> What exactly is failing?
> 
> Is it that fsync is returning -1 ? Should we just ignore errors from
> fsync if it happens in this stage? That might be safer than
> determining which files should or shouldn't be fsynced.

Interesting idea.  We could skip fsync -1 failures only for symbolic
links.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: [CORE] [BUGS] BUG #13350: blindly fsyncing data dir considered harmful

From
Andres Freund
Date:
On May 25, 2015 8:52:33 AM PDT, Bruce Momjian <bruce@momjian.us> wrote:
>On Mon, May 25, 2015 at 04:37:59PM +0100, Greg Stark wrote:
>> What exactly is failing?
>> 
>> Is it that fsync is returning -1 ? Should we just ignore errors from
>> fsync if it happens in this stage? That might be safer than
>> determining which files should or shouldn't be fsynced.
>
>Interesting idea.  We could skip fsync -1 failures only for symbolic
>links.

Now there's already a thread on jackets about this.

Andres


--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.



Greg Stark <stark@mit.edu> writes:
> What exactly is failing?
> Is it that fsync is returning -1 ?

According to the original report from Christoph Berg, it was open()
not fsync() that was failing, at least in permissions-based cases.

I'm not sure if we should just uniformly ignore all failures in this
phase.  That would have the merit of clearly not creating any new
startup failure cases compared to the previous code, but as you say
sometimes it might mean ignoring real problems.

Another idea would be to elog(LOG) such failures but press on anyway.

Lastly, there is a difference between ignoring failures in fsync_fname()
and ignoring failures in walkdir().  The latter is presumably a general
purpose subroutine and so I would expect it to throw errors if it gets a
failure in, say, opendir().  Which it does.  But do we want that in this
context?  Unreadable directories underneath $PGDATA are probably not a
good thing, but OTOH this would still mean that there's a new startup
failure condition compared to before.
        regards, tom lane



Re: [CORE] [BUGS] BUG #13350: blindly fsyncing data dir considered harmful

From
Stephen Frost
Date:
* Andres Freund (andres@anarazel.de) wrote:
> On May 25, 2015 8:52:33 AM PDT, Bruce Momjian <bruce@momjian.us> wrote:
> >On Mon, May 25, 2015 at 04:37:59PM +0100, Greg Stark wrote:
> >> What exactly is failing?
> >>
> >> Is it that fsync is returning -1 ? Should we just ignore errors from
> >> fsync if it happens in this stage? That might be safer than
> >> determining which files should or shouldn't be fsynced.
> >
> >Interesting idea.  We could skip fsync -1 failures only for symbolic
> >links.
>
> Now there's already a thread on jackets about this.

Agreed- we should probably keep the discussion about fixing it (or
addressing the original issue in some way that doesn't run into this
issue) on the other thread.  This is simply to point out the user impact
and that we need to fix it quickly.
Thanks!
    Stephen

All,

If it's any consolation, the folks at kernel.org are having a bad week
too: http://lwn.net/Articles/645720/

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: [CORE] [BUGS] BUG #13350: blindly fsyncing data dir considered harmful

From
Andres Freund
Date:
On 2015-05-25 14:11:34 -0700, Josh Berkus wrote:
> If it's any consolation, the folks at kernel.org are having a bad week
> too: http://lwn.net/Articles/645720/

This doesn't seem to come even close to their problems ;). A problem
that you can fix by making permissions isn't that bad.

Greetings,

Andres Freund