Thread: Re: [COMMITTERS] pgsql: Speed up CREATE DATABASE by deferring the fsyncs until after

stark@postgresql.org (Greg Stark) writes:
> Log Message:
> -----------
> Speed up CREATE DATABASE by deferring the fsyncs until after copying
> all the data and using posix_fadvise to nudge the OS into flushing it
> earlier. This also hopefully makes CREATE DATABASE avoid spamming the
> cache.

The buildfarm indicates that this patch has got some serious issues.
        regards, tom lane


I wrote:
> The buildfarm indicates that this patch has got some serious issues.

Actually, looking closer, some of the Windows machines started failing
after the *earlier* patch to add directory fsyncs.
        regards, tom lane


On Monday 15 February 2010 08:13:32 Tom Lane wrote:
> I wrote:
> > The buildfarm indicates that this patch has got some serious issues.
> 
> Actually, looking closer, some of the Windows machines started failing
> after the *earlier* patch to add directory fsyncs.
And not only the windows machines. Seems sensible to add a configure check 
whether directory-fsyncing works.
But at least I am not capable of writing good m4/configure.in/whatever without 
strong supervision...

Will try if nobody else with more knowledge does and if somebody will look 
over it afterwards.

Andres


On Mon, Feb 15, 2010 at 9:36 AM, Andres Freund <andres@anarazel.de> wrote:
> On Monday 15 February 2010 08:13:32 Tom Lane wrote:
>> Actually, looking closer, some of the Windows machines started failing
>> after the *earlier* patch to add directory fsyncs.
> And not only the windows machines. Seems sensible to add a configure check
> whether directory-fsyncing works.

It looks like a thing that can be filesystem-dependent. Maybe a kind
of runtime check?

Greetings
Marcin Mańk


Hi Marcin,

Sounds rather unlikely to me. Its likely handled at an upper layer (vfs in linux' case) and only overloaded when an
optimizedimplementation is available.
 
Which os do you see implementing that only on a part of the filesystems?

A runtime check would be creating, fsyncing and deleting a directory for every directory youre fsyncing because they
couldbe on a different fs...
 

Andres
--
Sent from a mobile phone - please excuse brevity and formatting.
----- Ursprüngliche Mitteilung -----
> On Mon, Feb 15, 2010 at 9:36 AM, Andres Freund <andres@anarazel.de> wrote:
> > On Monday 15 February 2010 08:13:32 Tom Lane wrote:
> > > Actually, looking closer, some of the Windows machines started failing
> > > after the *earlier* patch to add directory fsyncs.
> > And not only the windows machines. Seems sensible to add a configure check
> > whether directory-fsyncing works.
>
> It looks like a thing that can be filesystem-dependent. Maybe a kind
> of runtime check?
>
> Greetings
> Marcin Mańk
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers



On Mon, Feb 15, 2010 at 10:02 AM, Andres Freund <andres@anarazel.de> wrote:
> Hi Marcin,
>
> Sounds rather unlikely to me. Its likely handled at an upper layer (vfs in linux' case) and only overloaded when an
optimizedimplementation is available.
 
> Which os do you see implementing that only on a part of the filesystems?
>
> A runtime check would be creating, fsyncing and deleting a directory for every directory youre fsyncing because they
couldbe on a different fs...
 

We could just not check the result code of the fsync. Or print a
warning the first time and stop trying subsequently.

When do we cut the alpha? If I look at it at about 10-11pm EST is that too late?

-- 
greg


On Mon, Feb 15, 2010 at 11:02 AM, Andres Freund <andres@anarazel.de> wrote:
> Hi Marcin,
>
> Sounds rather unlikely to me. Its likely handled at an upper layer (vfs in linux' case) and only overloaded when an
optimizedimplementation is available. 
> Which os do you see implementing that only on a part of the filesystems?
>

I have a Windows XP dev machine, which runs virtualbox, which runs
ubuntu, which mounts a windows directory through vboxfs

fsync does error out on directories inside that mount.

btw: 8.4.2 initdb won`t work there too, So this is not a regression.
The error is:
DEBUG:  creating and filling new WAL file
LOG:  could not link file "pg_xlog/xlogtemp.2367" to
"pg_xlog/000000010000000000000000" (initialization of log file 0,
segment 0): Operation not permitted
FATAL:  could not open file "pg_xlog/000000010000000000000000" (log
file 0, segment 0): No such file or directory

But I would not be that sure that eg. NFS or something like that won`t complain.

Ignoring the return code seems the right choice.

Greetings
Marcin Mańk


On Mon, Feb 15, 2010 at 11:34 AM, marcin mank <marcin.mank@gmail.com> wrote:
> LOG:  could not link file "pg_xlog/xlogtemp.2367" to
> "pg_xlog/000000010000000000000000" (initialization of log file 0,
>

This is not related -- it seems your filesystem doesn't support hard
links. I thought we used "junctions" on versions of Windows that
support them which I would have expected would include XP but my
knowledge of Windows is thin and obsolete.

--
greg


On Monday 15 February 2010 12:34:44 marcin mank wrote:
> On Mon, Feb 15, 2010 at 11:02 AM, Andres Freund <andres@anarazel.de> wrote:
> > Hi Marcin,
> > 
> > Sounds rather unlikely to me. Its likely handled at an upper layer (vfs
> > in linux' case) and only overloaded when an optimized implementation is
> > available. Which os do you see implementing that only on a part of the
> > filesystems?
> 
> I have a Windows XP dev machine, which runs virtualbox, which runs
> ubuntu, which mounts a windows directory through vboxfs


> btw: 8.4.2 initdb won`t work there too, So this is not a regression.
> The error is:
> DEBUG:  creating and filling new WAL file
> LOG:  could not link file "pg_xlog/xlogtemp.2367" to
> "pg_xlog/000000010000000000000000" (initialization of log file 0,
> segment 0): Operation not permitted
> FATAL:  could not open file "pg_xlog/000000010000000000000000" (log
> file 0, segment 0): No such file or directory
That does seem to be a different issue. Currently there are no fsyncs on 
directories at all, so likely your setup is hosed anyway ;-)

> But I would not be that sure that eg. NFS or something like that won`t
> complain.
It does not.

> Ignoring the return code seems the right choice.
And the error hiding one as well. With delayed allocation you theoretically 
could error out on fsync with -ENOSPC ...


Andres


On Monday 15 February 2010 12:45:39 Greg Stark wrote:
> On Mon, Feb 15, 2010 at 11:34 AM, marcin mank <marcin.mank@gmail.com> wrote:
> > LOG:  could not link file "pg_xlog/xlogtemp.2367" to
> > "pg_xlog/000000010000000000000000" (initialization of log file 0,
> 
> This is not related -- it seems your filesystem doesn't support hard
> links. I thought we used "junctions" on versions of Windows that
> support them which I would have expected would include XP but my
> knowledge of Windows is thin and obsolete.
If I understood him correctly marcin seems to mount a windows share on linux 
via some vbox-proprietary pseudo filesystem. That wont get detected and thus 
no junctions will be used... (I have doubts you even can create them via 
vboxfs (or even smb)).
I would consider that a unsupported setup. Agreed?

Andres


On Mon, Feb 15, 2010 at 11:50 AM, Andres Freund <andres@anarazel.de> wrote:

> If I understood him correctly marcin seems to mount a windows share on linux
> via some vbox-proprietary pseudo filesystem. That wont get detected and thus
> no junctions will be used... (I have doubts you even can create them via
> vboxfs (or even smb)).
> I would consider that a unsupported setup. Agreed?

I'm not sure which versions of Windows we support in general. But on
further thought I thought we only used hard links for xlog files on
systems where we knew they worked and just did a rename() on systems
without them. So I'm puzzled why we're trying to hard link on this
system. Perhaps we need to make this a run-time check instead of just
making it depend on the system.


-- 
greg


On Monday 15 February 2010 12:55:36 Greg Stark wrote:
> On Mon, Feb 15, 2010 at 11:50 AM, Andres Freund <andres@anarazel.de> wrote:
> > If I understood him correctly marcin seems to mount a windows share on
> > linux via some vbox-proprietary pseudo filesystem. That wont get
> > detected and thus no junctions will be used... (I have doubts you even
> > can create them via vboxfs (or even smb)).
> > I would consider that a unsupported setup. Agreed?
> 
> I'm not sure which versions of Windows we support in general. But on
> further thought I thought we only used hard links for xlog files on
> systems where we knew they worked and just did a rename() on systems
> without them. So I'm puzzled why we're trying to hard link on this
> system. Perhaps we need to make this a run-time check instead of just
> making it depend on the system.
Well, I guess linux is normally a system where hardlinking is considered safe. 
And I dont really see a problem with that - for example we require ntfs on 
windows as well...
In the end its only some strange filesystem whats causing the issue here...

Andres


Yes, the issue with initdb failing is unrelated (and I have no problem
about the fs being unsupported). But fsync still DOES fail on
directories from the mount.

>> But I would not be that sure that eg. NFS or something like that won`t
>> complain.
> It does not.
>

What if someone mounts a NFS share from a system that does not support
directory fsync (per buildfarm: unixware, AIX) on Linux? I agree that
this is asking for trouble, but...

Greetings
Marcin Mańk


On Monday 15 February 2010 14:50:03 marcin mank wrote:
> Yes, the issue with initdb failing is unrelated (and I have no problem
> about the fs being unsupported). But fsync still DOES fail on
> directories from the mount.
> 
> >> But I would not be that sure that eg. NFS or something like that won`t
> >> complain.
> > 
> > It does not.
> 
> What if someone mounts a NFS share from a system that does not support
> directory fsync (per buildfarm: unixware, AIX) on Linux? I agree that
> this is asking for trouble, but...
Then nothing. The fsync via nfs or such is a local operation. There is nothing 
like a "fsync" command transported - i.e. the fsync controls the local cache 
not the remote one...

Andres


Re: [COMMITTERS] pgsql: Speed up CREATE DATABASE by deferring the fsyncs until after

From
Magnus Hagander
Date:
2010/2/15 Greg Stark <stark@mit.edu>:
> On Mon, Feb 15, 2010 at 11:34 AM, marcin mank <marcin.mank@gmail.com> wrote:
>> LOG:  could not link file "pg_xlog/xlogtemp.2367" to
>> "pg_xlog/000000010000000000000000" (initialization of log file 0,
>>
>
> This is not related -- it seems your filesystem doesn't support hard
> links. I thought we used "junctions" on versions of Windows that
> support them which I would have expected would include XP but my
> knowledge of Windows is thin and obsolete.

Junctions are for symbolic links, and only valid for directories. NTFS
has "real" hardlinks though CreateLink(). No idea if that works on
remote filesystems though.

But AFAIK, we don't use that on Windows. But the rest of the thread
has indicated why this shows up anyway :)


-- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/