Re: fsync-pgdata-on-recovery tries to write to more files than previously - Mailing list pgsql-hackers

From Andres Freund
Subject Re: fsync-pgdata-on-recovery tries to write to more files than previously
Date
Msg-id 20150526204403.GG5310@alap3.anarazel.de
Whole thread Raw
In response to Re: fsync-pgdata-on-recovery tries to write to more files than previously  (Andres Freund <andres@anarazel.de>)
Responses Re: fsync-pgdata-on-recovery tries to write to more files than previously
Re: fsync-pgdata-on-recovery tries to write to more files than previously
List pgsql-hackers
On 2015-05-26 19:07:20 +0200, Andres Freund wrote:
> It is somewhat interesting that similar code has been used in
> pg_upgrade, via initdb -S, for a while now, without, to my knowledge, it
> causing reported problem. I think the relevant difference is that that
> code doesn't follow symlinks.  It's obviously also less exercised and
> poeople might just have fixed up permissions when encountering troubles.
> 
> Abhijit, do you recall why the code was changed to follow all symlinks
> in contrast to explicitly going through the tablespaces as initdb -S
> does? I'm pretty sure early versions of the patch pretty much had a
> verbatim copy of the initdb logic?  That logic is missing pg_xlog btw,
> which is bad for pg_upgrade.

So, this was discussed in the following thread, starting at:
http://archives.postgresql.org/message-id/20150403163232.GA28444%40eldon.alvh.no-ip.org

"Actually, since surely we must follow symlinks everywhere, why do we
have to do this separately for pg_tblspc?  Shouldn't that link-following
occur automatically when walking PGDATA in the first place?"

I don't think it's true that we must follow symlinks everywhere. I
think, as argued upthread, that it's sufficient to recurse through
PGDATA, follow the symlinks in pg_tbspc, and if a symlink, also go
through pg_xlog separately.  There are no other places we it's "allowed"
to introduce symlinks and we have refuted bugreports of people having
problems after doing that.

So what I propose is:
1) Remove the automatic symlink following
2) Follow pg_tbspc/*, pg_xlog if it's a symlink, fix the latter in  initdb -S
3) Add a elevel argument to walkdir(), return if AllocateDir() fails,  continue for stat() failures in the readdir()
loop.
4) Add elevel argument to pre_sync_fname, fsync_fname, return after  errors.
5) Accept EACCESS, ETXTBSY (if defined) when open()ing the files. By  virtue of not following symlinks we should not
needto worry about  EROFS
 

I'm inclined to think that 4) is a big enough compat break that a
fsync_fname_ext with the new argument is a good idea.

Arguments for/against?



pgsql-hackers by date:

Previous
From: Naoya Anzai
Date:
Subject: why does txid_current() assign new transaction-id?
Next
From: Paul Smith
Date:
Subject: Re: ERROR: MultiXactId xxxx has not been created yet -- apparent wraparound