Re: fsync failure in durable_unlink ignored in xlog.c? - Mailing list pgsql-hackers

From Andres Freund
Subject Re: fsync failure in durable_unlink ignored in xlog.c?
Date
Msg-id 20190523181820.t4x22nqjonttwmwp@alap3.anarazel.de
Whole thread Raw
In response to Re: fsync failure in durable_unlink ignored in xlog.c?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Hi,

On 2019-05-23 14:06:57 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2019-05-23 10:46:02 -0700, Mark Dilger wrote:
> >> Is this code safe against fsync failures?  If so, can I get an explanation
> >> that I might put into a code comment patch?
> 
> > What's the danger you're thinking of here? The issue with ignoring fsync
> > failures is that it could be the one signal about data corruption we get
> > for a write()/fsync() that failed - i.e. that durability cannot be
> > guaranteed. But we don't care about the file contents of those files.
> 
> Hmm ... if we don't care, why are we issuing an fsync at all?

Fair point. I think we do care in most of those cases, but we don't need
to trigger a PANIC. We'd be in trouble if e.g. an older tablespace map
file were to "revive" later. Looking at the cases:

- durable_unlink(TABLESPACE_MAP, DEBUG1) - we definitely care about a
  failure to unlink/remove, but *not* about ENOENT, because that's expected.

- /* Force installation: get rid of any pre-existing segment file */
  durable_unlink(path, DEBUG1);

  same.

- RemoveXlogFile():
  rc = durable_unlink(path, LOG);

  It's probably *tolerable* to fail here. Not sure why this is a
  durable_unlink(LOG) - doesn't make a ton of sense to me.

- durable_unlink(BACKUP_LABEL_FILE, ERROR);

  This is a "whaa, bad shit is happening" kind of situation. But
  crashing probably would make it even worse, because we'd restart
  assuming we're restoring from a backup.

ISTM that durable_unlink() for the first two cases really needs a
separate 'missing_ok' type argument. And that the reason for using
DEBUG1 here is solely an outcome of that not existing.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Mark Dilger
Date:
Subject: Re: fsync failure in durable_unlink ignored in xlog.c?
Next
From: Mark Dilger
Date:
Subject: ClosePipeStream failure ignored in pg_import_system_collations