Home > mailing lists

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS - Mailing list pgsql-hackers

From	Craig Ringer
Subject	Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Date	April 4, 2018 09:00:21
Msg-id	CAMsr+YGiVR_BWSPjdra+0DbHfyfYB7=DxBmg2f59wf_uvM0zxg@mail.gmail.com Whole thread Raw
In response to	Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS (Thomas Munro <thomas.munro@enterprisedb.com>)
Responses	Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
List	pgsql-hackers

Tree view

On 4 April 2018 at 13:29, Thomas Munro <thomas.munro@enterprisedb.com> wrote:

On Wed, Apr 4, 2018 at 2:44 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Wed, Apr 4, 2018 at 2:14 PM, Bruce Momjian <bruce@momjian.us> wrote:
>> Uh, are you sure it fixes our use-case? From the email description it
>> sounded like it only reported fsync errors for every open file
>> descriptor at the time of the failure, but the checkpoint process might
>> open the file _after_ the failure and try to fsync a write that happened
>> _before_ the failure.
>
> I'm not sure of anything. I can see that it's designed to report
> errors since the last fsync() of the *file* (presumably via any fd),
> which sounds like the desired behaviour:
>
> [..]

Scratch that. Whenever you open a file descriptor you can't see any
preceding errors at all, because:

/* Ensure that we skip any errors that predate opening of the file */
f->f_wb_err = filemap_sample_wb_err(f->f_mapping);

https://github.com/torvalds/linux/blob/master/fs/open.c#L752

Our whole design is based on being able to open, close and reopen
files at will from any process, and in particular to fsync() from a
different process that didn't inherit the fd but instead opened it
later. But it looks like that might be able to eat errors that
occurred during asynchronous writeback (when there was nobody to
report them to), before you opened the file?

Holy hell. So even PANICing on fsync() isn't sufficient, because the kernel will deliberately hide writeback errors that predate our fsync() call from us?

I'll see if I can expand my testcase for that. I'm presently dockerizing it to make it easier for others to use, but that turns out to be a major pain when using devmapper etc. Docker in privileged mode doesn't seem to play nice with device-mapper.

Does that mean that the ONLY ways to do reliable I/O are:

- single-process, single-file-descriptor write() then fsync(); on failure, retry all work since last successful fsync()

- direct I/O

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

From: Amit Langote
Date: 04 April 2018, 08:42:58
Subject: Re: [HACKERS] path toward faster partition pruning

From: Amit Langote
Date: 04 April 2018, 09:27:48
Subject: Re: [HACKERS] Runtime Partition Pruning

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS - Mailing list pgsql-hackers

Previous

Next