Re: silent data loss with ext4 / all current versions - Mailing list pgsql-hackers

From Andres Freund
Subject Re: silent data loss with ext4 / all current versions
Date
Msg-id 20160315174616.ju5ybldinilz6rxv@alap3.anarazel.de
Whole thread Raw
In response to Re: silent data loss with ext4 / all current versions  (Michael Paquier <michael.paquier@gmail.com>)
Responses Re: silent data loss with ext4 / all current versions
List pgsql-hackers
On 2016-03-15 15:39:50 +0100, Michael Paquier wrote:
> I have finally been able to spend some time reviewing what you pushed
> on back-branches, and things are in correct shape I think. One small
> issue that I have is that for EXEC_BACKEND builds, in
> write_nondefault_variables we still use one instance of rename().

Correctly so afaics, because write_nondefault_variables is by definition
non-durable. We write that stuff at every start / SIGHUP. Adding an
fsync there would be an unnecessary slowdown.  I don't think it's good
policy adding fsync for stuff that definitely doesn't need it.


> Yeah, true. We definitely need to do something for that, even for HEAD
> it seems like an overkill to have something in for example src/common
> to allow frontends to have something if the fix is localized
> (pg_rewind may use something else), and it would be nice to finish
> wrapping that for the next minor release, so I propose the attached
> patches. At the same time, I think that adminpack had better be fixed
> as well, so there are actually three patches in this series, things
> that I shaped thinking about a backpatch btw, particularly for 0002.

I'm doubtful about "fixing" adminpack. We don't know how it's used, and
this could *seriously* increase its overhead for something potentially
used at a high rate.


>  /*
> + * Wrapper of rename() similar to the backend version with the same function
> + * name aimed at making the renaming durable on disk. Note that this version
> + * does not fsync the old file before the rename as all the code paths leading
> + * to this function are already doing this operation. The new file is also
> + * normally not present on disk before the renaming so there is no need to
> + * bother about it.
> + */
> +static int
> +durable_rename(const char *oldfile, const char *newfile)
> +{
> +    int        fd;
> +    char    parentpath[MAXPGPATH];
> +
> +    if (rename(oldfile, newfile) != 0)
> +    {
> +        /* durable_rename produced a log entry */
> +        fprintf(stderr, _("%s: could not rename file \"%s\": %s\n"),
> +                progname, current_walfile_name, strerror(errno));
> +        return -1;
> +    }
> +
> +    /*
> +     * To guarantee renaming of the file is persistent, fsync the file with its
> +     * new name, and its containing directory.
> +     */
> +    fd = open(newfile, O_RDWR | PG_BINARY, S_IRUSR | S_IWUSR);

Why is S_IRUSR | S_IWUSR specified here?


Are you working on a fix for pg_rewind? Let's go with initdb -S in a
first iteration, then we can, if somebody is interest enough, work on
making this nicer in master.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: IF (NOT) EXISTS in psql-completion
Next
From: Corey Huinker
Date:
Subject: Re: Soliciting Feedback on Improving Server-Side Programming Documentation