Thread: Re: [BUGS] BUG #13350: blindly fsyncing data dir considered harmful
All, (sending to -core to request a release) * andrew@tao11.riddles.org.uk (andrew@tao11.riddles.org.uk) wrote: > Operating system: Debian (and probably others) > The addition of a recursive fsync of the data dir on startup (in the absence > of a clean shutdown) causes startup to fail if the data dir contains > symlinks to files which the postgres user can't write to. > > This is the standard configuration for many SSL-enabled setups, including > the standard debian packaging defaults. Accordingly, crash recovery now > ALWAYS fails on such systems without manual intervention. Andrew did a great job summarizing the problem, don't know that there's much to add there. This was back-patched all the way and released with the latest round of minor releases, and given that it means crash recovery fails for a large number of deployed systems, I think we need to fix (or revert) the recursive fsync change (d8ac77ab178ddb2ae043b8c463cd30c031e793d0 and related) and do new releases very shortly. Thanks! Stephen
Re: [CORE] [BUGS] BUG #13350: blindly fsyncing data dir considered harmful
From
Magnus Hagander
Date:
<p dir="ltr"><br /> On May 25, 2015 5:12 PM, "Stephen Frost" <<a href="mailto:sfrost@snowman.net">sfrost@snowman.net</a>>wrote:<br /> ><br /> > All,<br /> ><br /> > (sendingto -core to request a release)<br /> ><br /> > * <a href="mailto:andrew@tao11.riddles.org.uk">andrew@tao11.riddles.org.uk</a>(<a href="mailto:andrew@tao11.riddles.org.uk">andrew@tao11.riddles.org.uk</a>)wrote:<br /> > > Operating system: Debian(and probably others)<br /> ><br /> > > The addition of a recursive fsync of the data dir on startup (inthe absence<br /> > > of a clean shutdown) causes startup to fail if the data dir contains<br /> > > symlinksto files which the postgres user can't write to.<br /> > ><br /> > > This is the standard configurationfor many SSL-enabled setups, including<br /> > > the standard debian packaging defaults. Accordingly,crash recovery now<br /> > > ALWAYS fails on such systems without manual intervention.<br /> ><br />> Andrew did a great job summarizing the problem, don't know that there's<br /> > much to add there.<br /> ><br/> > This was back-patched all the way and released with the latest round of<br /> > minor releases, and giventhat it means crash recovery fails for a large<br /> > number of deployed systems, I think we need to fix (or revert)the<br /> > recursive fsync change (d8ac77ab178ddb2ae043b8c463cd30c031e793d0 and<br /> > related) and do newreleases very shortly.<br /> ><p dir="ltr">Agreed, this is a pretty bad regression and we need to at least do somethingand out out a release asap - either revert or if we can find a better way (see the other thread about this issuefor some other ideas). <p dir="ltr">It happens to be the default shipment on Debian and Ubuntu but it's definitely nota platform specific problem I believe, so we should put out a "real" release and not expect packagers to carry a specificpatch for it. <p dir="ltr">/Magnus
* Magnus Hagander (magnus@hagander.net) wrote: > On May 25, 2015 5:12 PM, "Stephen Frost" <sfrost@snowman.net> wrote: > > This was back-patched all the way and released with the latest round of > > minor releases, and given that it means crash recovery fails for a large > > number of deployed systems, I think we need to fix (or revert) the > > recursive fsync change (d8ac77ab178ddb2ae043b8c463cd30c031e793d0 and > > related) and do new releases very shortly. > > Agreed, this is a pretty bad regression and we need to at least do > something and out out a release asap - either revert or if we can find a > better way (see the other thread about this issue for some other ideas). > > It happens to be the default shipment on Debian and Ubuntu but it's > definitely not a platform specific problem I believe, so we should put out > a "real" release and not expect packagers to carry a specific patch for it. Agreed, there are certainly other reasons why a file might exist which can't be written to by the postgres user, we really can't have crash recovery fail because of it. Further, I believe that a lot of the .deb-based distributions use the same technique of using symlinks (Ubuntu included, but even those who aren't downstream of Debian), and they would all have to be updated with such a patch. Sadly, suggesting to stay on a prior version (eg: 9.4.1) really isn't acceptable either, given the corruption risks which 9.4.2 addressed. Thanks! Stephen
What exactly is failing? Is it that fsync is returning -1 ? Should we just ignore errors from fsync if it happens in this stage? That might be safer than determining which files should or shouldn't be fsynced. That would also have an impact on people starting up on a flaky file system perhaps. I'm imagining either a database mounted on a filesystem with corruption trying to extract what they can or perhaps something like an NFS filesystem with dangling .nfs files. On the one hand limping along as best we can is the general Postgres strategy when the filesystem starts failing but on the other hand we have had circumstances in the past when users had a database on a network filesystem that wasn't really ready for use yet when Postgres tried to start up and starting up anyways didn't do them any favours.
On Mon, May 25, 2015 at 04:37:59PM +0100, Greg Stark wrote: > What exactly is failing? > > Is it that fsync is returning -1 ? Should we just ignore errors from > fsync if it happens in this stage? That might be safer than > determining which files should or shouldn't be fsynced. Interesting idea. We could skip fsync -1 failures only for symbolic links. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On May 25, 2015 8:52:33 AM PDT, Bruce Momjian <bruce@momjian.us> wrote: >On Mon, May 25, 2015 at 04:37:59PM +0100, Greg Stark wrote: >> What exactly is failing? >> >> Is it that fsync is returning -1 ? Should we just ignore errors from >> fsync if it happens in this stage? That might be safer than >> determining which files should or shouldn't be fsynced. > >Interesting idea. We could skip fsync -1 failures only for symbolic >links. Now there's already a thread on jackets about this. Andres --- Please excuse brevity and formatting - I am writing this on my mobile phone.
Greg Stark <stark@mit.edu> writes: > What exactly is failing? > Is it that fsync is returning -1 ? According to the original report from Christoph Berg, it was open() not fsync() that was failing, at least in permissions-based cases. I'm not sure if we should just uniformly ignore all failures in this phase. That would have the merit of clearly not creating any new startup failure cases compared to the previous code, but as you say sometimes it might mean ignoring real problems. Another idea would be to elog(LOG) such failures but press on anyway. Lastly, there is a difference between ignoring failures in fsync_fname() and ignoring failures in walkdir(). The latter is presumably a general purpose subroutine and so I would expect it to throw errors if it gets a failure in, say, opendir(). Which it does. But do we want that in this context? Unreadable directories underneath $PGDATA are probably not a good thing, but OTOH this would still mean that there's a new startup failure condition compared to before. regards, tom lane
* Andres Freund (andres@anarazel.de) wrote: > On May 25, 2015 8:52:33 AM PDT, Bruce Momjian <bruce@momjian.us> wrote: > >On Mon, May 25, 2015 at 04:37:59PM +0100, Greg Stark wrote: > >> What exactly is failing? > >> > >> Is it that fsync is returning -1 ? Should we just ignore errors from > >> fsync if it happens in this stage? That might be safer than > >> determining which files should or shouldn't be fsynced. > > > >Interesting idea. We could skip fsync -1 failures only for symbolic > >links. > > Now there's already a thread on jackets about this. Agreed- we should probably keep the discussion about fixing it (or addressing the original issue in some way that doesn't run into this issue) on the other thread. This is simply to point out the user impact and that we need to fix it quickly. Thanks! Stephen
All, If it's any consolation, the folks at kernel.org are having a bad week too: http://lwn.net/Articles/645720/ -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 2015-05-25 14:11:34 -0700, Josh Berkus wrote: > If it's any consolation, the folks at kernel.org are having a bad week > too: http://lwn.net/Articles/645720/ This doesn't seem to come even close to their problems ;). A problem that you can fix by making permissions isn't that bad. Greetings, Andres Freund