pgsql: Introduce durable_rename() and durable_link_or_rename(). - Mailing list pgsql-committers

From Andres Freund
Subject pgsql: Introduce durable_rename() and durable_link_or_rename().
Date
Msg-id E1adrDy-0001OK-Nc@gemulon.postgresql.org
Whole thread Raw
List pgsql-committers
Introduce durable_rename() and durable_link_or_rename().

Renaming a file using rename(2) is not guaranteed to be durable in face
of crashes; especially on filesystems like xfs and ext4 when mounted
with data=writeback. To be certain that a rename() atomically replaces
the previous file contents in the face of crashes and different
filesystems, one has to fsync the old filename, rename the file, fsync
the new filename, fsync the containing directory.  This sequence is not
generally adhered to currently; which exposes us to data loss risks. To
avoid having to repeat this arduous sequence, introduce
durable_rename(), which wraps all that.

Also add durable_link_or_rename(). Several places use link() (with a
fallback to rename()) to rename a file, trying to avoid replacing the
target file out of paranoia. Some of those rename sequences need to be
durable as well. There seems little reason extend several copies of the
same logic, so centralize the link() callers.

This commit does not yet make use of the new functions; they're used in
a followup commit.

Author: Michael Paquier, Andres Freund
Discussion: 56583BDD.9060302@2ndquadrant.com
Backpatch: All supported branches

Branch
------
REL9_3_STABLE

Details
-------
http://git.postgresql.org/pg/commitdiff/e069848a3966bf64b4f4dc24d66a353d50878312

Modified Files
--------------
src/backend/storage/file/fd.c     | 257 +++++++++++++++++++++++++++++++-------
src/backend/storage/file/reinit.c |   2 +-
src/include/storage/fd.h          |   4 +-
3 files changed, 213 insertions(+), 50 deletions(-)


pgsql-committers by date:

Previous
From: Andres Freund
Date:
Subject: pgsql: Avoid unlikely data-loss scenarios due to rename() without fsync
Next
From: Andres Freund
Date:
Subject: pgsql: Avoid unlikely data-loss scenarios due to rename() without fsync