Hi,
On 2019-05-23 14:06:57 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2019-05-23 10:46:02 -0700, Mark Dilger wrote:
> >> Is this code safe against fsync failures? If so, can I get an explanation
> >> that I might put into a code comment patch?
>
> > What's the danger you're thinking of here? The issue with ignoring fsync
> > failures is that it could be the one signal about data corruption we get
> > for a write()/fsync() that failed - i.e. that durability cannot be
> > guaranteed. But we don't care about the file contents of those files.
>
> Hmm ... if we don't care, why are we issuing an fsync at all?
Fair point. I think we do care in most of those cases, but we don't need
to trigger a PANIC. We'd be in trouble if e.g. an older tablespace map
file were to "revive" later. Looking at the cases:
- durable_unlink(TABLESPACE_MAP, DEBUG1) - we definitely care about a
failure to unlink/remove, but *not* about ENOENT, because that's expected.
- /* Force installation: get rid of any pre-existing segment file */
durable_unlink(path, DEBUG1);
same.
- RemoveXlogFile():
rc = durable_unlink(path, LOG);
It's probably *tolerable* to fail here. Not sure why this is a
durable_unlink(LOG) - doesn't make a ton of sense to me.
- durable_unlink(BACKUP_LABEL_FILE, ERROR);
This is a "whaa, bad shit is happening" kind of situation. But
crashing probably would make it even worse, because we'd restart
assuming we're restoring from a backup.
ISTM that durable_unlink() for the first two cases really needs a
separate 'missing_ok' type argument. And that the reason for using
DEBUG1 here is solely an outcome of that not existing.
Greetings,
Andres Freund