Re: fsync-pgdata-on-recovery tries to write to more files than previously - Mailing list pgsql-hackers

From Abhijit Menon-Sen
Subject Re: fsync-pgdata-on-recovery tries to write to more files than previously
Date
Msg-id 20150527061639.GA31904@toroid.org
Whole thread Raw
In response to Re: fsync-pgdata-on-recovery tries to write to more files than previously  (Andres Freund <andres@anarazel.de>)
Responses Re: fsync-pgdata-on-recovery tries to write to more files than previously  (Abhijit Menon-Sen <ams@2ndQuadrant.com>)
List pgsql-hackers
At 2015-05-26 22:44:03 +0200, andres@anarazel.de wrote:
>
> So what I propose is:
> 1) Remove the automatic symlink following
> 2) Follow pg_tbspc/*, pg_xlog if it's a symlink, fix the latter in
>    initdb -S
> 3) Add a elevel argument to walkdir(), return if AllocateDir() fails,
>    continue for stat() failures in the readdir() loop.
> 4) Add elevel argument to pre_sync_fname, fsync_fname, return after
>    errors.
> 5) Accept EACCESS, ETXTBSY (if defined) when open()ing the files. By
>    virtue of not following symlinks we should not need to worry about
>    EROFS

Here's a WIP patch for discussion.

I've (a) removed the S_ISLNK() branch in walkdir, (b) reintroduced
walktblspc_links to call walkdir on each of the entries within pg_tblspc
(simpler than trying to make walkdir follow links only for pg_xlog and
under pg_tblspc), (c) call walkdir on pg_xlog if it's a symlink (not
done for initdb -S; will submit separately), (d) add elevel arguments as
described, (e) ignore EACCES and ETXTBSY.

This correctly fsync()s stuff according to strace, and doesn't die if
there are unreadable files/links in PGDATA.

What I haven't done is return if AllocateDir() fails. I'm not convinced
that's correct, because it'll not complain if PGDATA is unreadable (but
this will break other things, so it doesn't matter), but also will die
if readdir fails rather than opendir.

I'm trying a couple of approaches to that (e.g. using readdir directly
instead of ReadDir), but other suggestions are welcome.

-- Abhijit

Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: why does txid_current() assign new transaction-id?
Next
From: Abhijit Menon-Sen
Date:
Subject: Re: fsync-pgdata-on-recovery tries to write to more files than previously