Thread: Re: [COMMITTERS] Re: pgsql: Speed up CREATE DATABASE by deferring the fsyncs until after
Re: [COMMITTERS] Re: pgsql: Speed up CREATE DATABASE by deferring the fsyncs until after
From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes: > On Mon, Feb 22, 2010 at 2:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I also think it should scan the todir not the fromdir, just on >> general principles to avoid any possibility of race conditions. > I had concluded that scanning the original directory was odd but > better because it served to double-check that all the original files > actually made it and also because if there were any unrelated files > present there was no need to fsync them. Well, just for the record: if that was actually intentional then both of you erred seriously by not including a comment that explained that the coding was intentional (and giving the reasoning). Any reader of the code would have assumed that it was a copy-and-paste error, as I did. > But I agree it's odd and not > very general for copydir if we decide to use it elsewhere other than > create database. Yeah, to me it seems more likely to cause problems down the road than to catch anything. If the system is missing directory entries during ReadDir then we have problems far beyond what copydir can deal with. The point of the fsync loop is just to force the copy results down to the platter, not to cross-check that the source directory isn't changing. (And, what's more, I don't believe that the source directory can't change during CREATE DATABASE. Consider delayed cleanup of deleted relations during checkpoints.) regards, tom lane
Re: [COMMITTERS] Re: pgsql: Speed up CREATE DATABASE by deferring the fsyncs until after
From
Tom Lane
Date:
BTW, I notice that after allegedly fixing things, we are now seeing fsync failures during CREATE DATABASE in the installcheck phase of buildfarm runs on (apparently) all the Windows critters, plus a couple of other platforms too. This mystifies me. I could believe that there was something still wrong with copydir.c, but then how come these machines are getting through the earlier "make check" phase? I made a couple of code tweaks just now to try to get more information --- the reported EBADF error numbers seem fairly implausible in themselves, so I wondered if that's *really* what fsync is reporting. I don't have a lot of hope for that though. Any theories about what is happening? regards, tom lane
Re: [COMMITTERS] Re: pgsql: Speed up CREATE DATABASE by deferring the fsyncs until after
From
Andres Freund
Date:
Hi Tom, On Tuesday 23 February 2010 06:37:26 Tom Lane wrote: > I wrote: > > Any theories about what is happening? > Now, this doesn't mean that all is fine and dandy. I believe that a > majority of Unixen will reject attempts to open directories for writing, > so this solution puts us even further away from being able to fsync the > directories. I would bet however that the platforms that reject this > are ones that don't need fsync on directories. Maybe we just have to > have two different code paths depending on platform :-( Cool. You can't open a directory for writing under linux as well though - so that wont be the decisive argument. Do you have a better idea than a configure test? Andres
Re: [COMMITTERS] Re: pgsql: Speed up CREATE DATABASE by deferring the fsyncs until after
From
Greg Stark
Date:
On Tue, Feb 23, 2010 at 5:37 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > So the problem is that fsync_fname is trying to fsync a file it's opened > O_RDONLY. I don't know whether Windows is similarly picky, but we'll > soon find out. > Argh, now I feel silly. I had actually found that in my searches after the first batch of problems. But somehow i didn't connect that to the current problems. Sorry. There are other similarly confused OSes that don't allow fsync on read-only file descriptors: http://svn.haxx.se/dev/archive-2006-02/0488.shtml (I wonder if some of them are doing fsync wrong and only syncing changes written to this file descriptor and not any file descriptor for this file?) The plan I was thinking of was to pass a flag indicating if it's a directory to fsync_fname() and open it RD_ONLY if it's a directory and RDRW if it's a file. Then ignore any error returns (or at least EBADF and EINVAL) iff the flag indicating it was a directory was true. I don't like using configure tests for this because I fear someone could compile Postgres on a system with one set of behaviour and then switch to a different kernel version with a different set of behaviour. In the worst case it could be filesystem dependent whether you can open directories or whether they accept fsyncs. -- greg
Re: [COMMITTERS] Re: pgsql: Speed up CREATE DATABASE by deferring the fsyncs until after
From
Tom Lane
Date:
I wrote: > Any theories about what is happening? Hah --- the AIX failures, at least, are explained at http://publib.boulder.ibm.com/infocenter/aix/v6r1/index.jsp?topic=/com.ibm.aix.basetechref/doc/basetrf1/fsync.htm which says Error Codes The fsync or fsync_range subroutine is unsuccessful if one or more of the following are true: EIO An I/O error occurred while reading from or writing to the file system. EBADF The FileDescriptor parameter is not a valid file descriptor open for writing. EINVAL The file is not a regular file. EINTR The subroutine was interrupted by a signal. So the problem is that fsync_fname is trying to fsync a file it's opened O_RDONLY. I don't know whether Windows is similarly picky, but we'll soon find out. Now, this doesn't mean that all is fine and dandy. I believe that a majority of Unixen will reject attempts to open directories for writing, so this solution puts us even further away from being able to fsync the directories. I would bet however that the platforms that reject this are ones that don't need fsync on directories. Maybe we just have to have two different code paths depending on platform :-( regards, tom lane
Re: [COMMITTERS] Re: pgsql: Speed up CREATE DATABASE by deferring the fsyncs until after
From
Tom Lane
Date:
Greg Stark <stark@mit.edu> writes: > I don't like using configure tests for this because I fear someone > could compile Postgres on a system with one set of behaviour and then > switch to a different kernel version with a different set of > behaviour. In the worst case it could be filesystem dependent whether > you can open directories or whether they accept fsyncs. Yeah, and there's also the problem of cross-compilation. I don't want a configure test either if we can avoid it. > The plan I was thinking of was to pass a flag indicating if it's a > directory to fsync_fname() and open it RD_ONLY if it's a directory and > RDRW if it's a file. Then ignore any error returns (or at least EBADF > and EINVAL) iff the flag indicating it was a directory was true. Works for me, but let's first try just ignoring EBADF, which is the only value we saw in the recent buildfarm failures. If we got past the fd<0 test then EBADF could only indicate a rdonly failure, whereas it's less clear what EINVAL might cover. regards, tom lane
Re: [COMMITTERS] Re: pgsql: Speed up CREATE DATABASE by deferring the fsyncs until after
From
Tom Lane
Date:
I wrote: > BTW, I notice that after allegedly fixing things, we are now seeing > fsync failures during CREATE DATABASE in the installcheck phase of > buildfarm runs on (apparently) all the Windows critters, plus a > couple of other platforms too. This mystifies me. I could believe > that there was something still wrong with copydir.c, but then how > come these machines are getting through the earlier "make check" > phase? BTW, although things seem to be going green with the RDONLY->RDWR change, I'm still mystified why these machines didn't fail at "make check". Is it possible that "make check" runs the postmaster with fsync disabled? I don't see that in the code anywhere... regards, tom lane
Re: [COMMITTERS] Re: pgsql: Speed up CREATE DATABASE by deferring the fsyncs until after
From
Greg Stark
Date:
On Tue, Feb 23, 2010 at 3:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The plan I was thinking of was to pass a flag indicating if it's a >> directory to fsync_fname() and open it RD_ONLY if it's a directory and >> RDRW if it's a file. Then ignore any error returns (or at least EBADF >> and EINVAL) iff the flag indicating it was a directory was true. > > Works for me, but let's first try just ignoring EBADF, which is the only > value we saw in the recent buildfarm failures. If we got past the fd<0 > test then EBADF could only indicate a rdonly failure, whereas it's less > clear what EINVAL might cover. So I'm thinking of something like this. Ignore ESDIR when opening a directory (and return immediately) and ignore EBADF when trying to fsync a directory. -- greg
Attachment
Re: [COMMITTERS] Re: pgsql: Speed up CREATE DATABASE by deferring the fsyncs until after
From
Tom Lane
Date:
Greg Stark <stark@mit.edu> writes: > So I'm thinking of something like this. > Ignore ESDIR when opening a directory (and return immediately) > and ignore EBADF when trying to fsync a directory. Seems reasonable, but get rid of the comment "However we can't do this just yet, it has portability issues"; and you've got a double semicolon in one place. It might also be worth commenting the BasicOpenFile calls along the lines of "Many OSs don't let us open directories RDWR, while some reject fsync on files opened RDONLY, so we need two cases." regards, tom lane