Re: initdb error - Mailing list pgsql-general

From Tom Lane
Subject Re: initdb error
Date
Msg-id 5638.1355516066@sss.pgh.pa.us
Whole thread Raw
In response to Re: initdb error  (David Noel <david.i.noel@gmail.com>)
Responses Re: initdb error
List pgsql-general
David Noel <david.i.noel@gmail.com> writes:
> On 12/14/12, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> This corresponds to the execution of XLogFileInit(), and what's
>> evidently happening is that we successfully create and zero-fill
>> the first xlog segment file under a temporary name, but then
>> the attempt to rename it into place with link() fails with EPERM.
>>
>> This is really a WTF kind of failure, I think.  The directory is
>> certainly writable --- it was made under our own UID, and what's
>> more we just managed to create the file there under its temp name.
>> So how can we get an EPERM failure from link()?
>>
>> I think this is a kernel bug.

> Thanks so much for the analysis. Where to from here? The
> freebsd-database@freebsd.org mailing list? The postgresql port
> maintainer? Who should I be in touch with?

You need to talk to some FreeBSD kernel hackers about why link()
might be failing here.  Since you see it on UFS too, we can probably
exonerate the ZFS filesystem-specific code.

I did some googling and found that EPERM can be issued if the filesystem
doesn't support hard links (which shouldn't apply to ZFS I trust).
Also, Linux has a "protected_hardlinks" option that causes certain
attempts at creating hard links to fail --- but our use-case here
doesn't fall foul of any of those restrictions AFAICS, and of course
FreeBSD isn't Linux.  Still, I wonder if you're running into some
misdesigned or misimplemented security restriction.  You might want
to look at your kernel parameters and see if any of them look like
they might have to do with restricting hard-link operations.

Also, since Amitabh failed to duplicate the failure on both earlier
and later FreeBSD kernels, and we've not heard reports of this from
anybody else either, it seems more than possible that it's a plain
old bug in the specific kernel version you're using.

As a short-term workaround, I'd suggest rebuilding with
HAVE_WORKING_LINK disabled.  (Just remove that #define from
src/include/pg_config_manual.h and rebuild.)

            regards, tom lane


pgsql-general by date:

Previous
From: Joe Van Dyk
Date:
Subject: Quickly making a column non-nullable (without a table scan)
Next
From: "David Johnston"
Date:
Subject: Re: PgSQL 9.1: Warning - error 10061 on Windows, no error on Linux - but connection is broken