Home > mailing lists

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS - Mailing list pgsql-hackers

From	Craig Ringer
Subject	Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Date	April 2, 2018 18:03:42
Msg-id	CAMsr+YHtosoQKzHh-nAmyG75cAPTzTtwyk871d+1O-sNQRdeyg@mail.gmail.com Whole thread
In response to	Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS (Thomas Munro <thomas.munro@enterprisedb.com>)
Responses	Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
List	pgsql-hackers

Tree view

On 2 April 2018 at 02:24, Thomas Munro <thomas.munro@enterprisedb.com> wrote:

Maybe my drive-by assessment of those kernel routines is wrong and
someone will correct me, but I'm starting to think you might be better
to assume the worst on all systems. Perhaps a GUC that defaults to
panicking, so that users on those rare OSes could turn that off? Even
then I'm not sure if the failure mode will be that great anyway or if
it's worth having two behaviours. Thoughts?

I see little benefit to not just PANICing unconditionally on EIO, really. It shouldn't happen, and if it does, we want to be pretty conservative and adopt a data-protective approach.

I'm rather more worried by doing it on ENOSPC. Which looks like it might be necessary from what I recall finding in my test case + kernel code reading. I really don't want to respond to a possibly-transient ENOSPC by PANICing the whole server unnecessarily.

BTW, the support team at 2ndQ is presently working on two separate issues where ENOSPC resulted in DB corruption, though neither of them involve logs of lost page writes. I'm planning on taking some time tomorrow to write a torture tester for Pg's ENOSPC handling and to verify ENOSPC handling in the test case I linked to in my original StackOverflow post.

If this is just an EIO issue then I see no point doing anything other than PANICing unconditionally.

If it's a concern for ENOSPC too, we should try harder to fail more nicely whenever we possibly can.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

From: Robert Haas
Date: 02 April 2018, 17:57:15
Subject: Re: [HACKERS] Partition-wise aggregation/grouping

From: "Bossart, Nathan"
Date: 02 April 2018, 18:04:16
Subject: Re: Change RangeVarGetRelidExtended() to take flags argument?

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS - Mailing list pgsql-hackers

Previous

Next