Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS - Mailing list pgsql-hackers

From Antonis Iliopoulos
Subject Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Date
Msg-id CAN+tDYwLHyXKCMDk_DKV1Lujt5qmNxm1AStqBLwhFNQ6ov25pg@mail.gmail.com
Whole thread Raw
In response to Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS  (Craig Ringer <craig@2ndquadrant.com>)
List pgsql-hackers


On Wed, Apr 4, 2018 at 4:42 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
>
> On 4 April 2018 at 22:25, Bruce Momjian <bruce@momjian.us> wrote:
>>
>> On Wed, Apr  4, 2018 at 10:09:09PM +0800, Craig Ringer wrote:
>> > On 4 April 2018 at 22:00, Craig Ringer <craig@2ndquadrant.com> wrote:
>> >  
>> >
>> >     It's the error reporting issues around closing and reopening files with
>> >     outstanding buffered I/O that's really going to hurt us here. I'll be
>> >     expanding my test case to cover that shortly.
>> >
>> >
>> >
>> > Also, just to be clear, this is not in any way confined to xfs and/or lvm as I
>> > originally thought it might be.
>> >
>> > Nor is ext3/ext4's errors=remount-ro protective. data_err=abort doesn't help
>> > either (so what does it do?).
>>
>> Anthony Iliopoulos reported in this thread that errors=remount-ro is
>> only affected by metadata writes.
>
>
> Yep, I gathered. I was referring to data_err.  

As far as I recall data_err=abort pertains to the jbd2 handling of
potential writeback errors. Jbd2 will inetrnally attempt to drain
the data upon txn commit (and it's even kind enough to restore
the EIO at the address space level, that otherwise would get eaten).

When data_err=abort is set, then jbd2 forcibly shuts down the
entire journal, with the error being propagated upwards to ext4.
I am not sure at which point this would be manifested to userspace
and how, but in principle any subsequent fs operations would get
some filesystem error due to the journal being down (I would
assume similar to remounting the fs read-only).

Since you are using data=journal, I would indeed expect to see
something more than what you saw in dmesg.

I can have a look later, I plan to also respond to some of the other
interesting issues that you guys raised in the thread.

Best regards,
Anthony

pgsql-hackers by date:

Previous
From: Marina Polyakova
Date:
Subject: Add support for printing/reading MergeAction nodes
Next
From: Craig Ringer
Date:
Subject: Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS