Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS - Mailing list pgsql-hackers

From Craig Ringer
Subject Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Date
Msg-id CAMsr+YGFqgMWdfM2kOMCRyMofYe8zwsEuHZ3vc+rzSZpPty0Eg@mail.gmail.com
Whole thread Raw
In response to Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers
On 9 April 2018 at 06:29, Bruce Momjian <bruce@momjian.us> wrote:
 

I think the big problem is that we don't have any way of stopping
Postgres at the time the kernel reports the errors to the kernel log, so
we are then returning potentially incorrect results and committing
transactions that might be wrong or lost.

Right.

Specifically, we need a way to ask the kernel at checkpoint time "was everything written to [this set of files] flushed successfully since the last time I asked, no matter who did the writing and no matter how the writes were flushed?"

If the result is "no" we PANIC and redo. If the hardware/volume is screwed, the user can fail over to a standby, do PITR, etc.

But we don't have any way to ask that reliably at present.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Next
From: Andres Freund
Date:
Subject: Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS