Home > mailing lists

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Date	April 23, 2018 23:14:48
Msg-id	20180423201448.nxe6jc5tu63kzum7@alap3.anarazel.de Whole thread Raw
In response to	PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS (Craig Ringer <craig@2ndquadrant.com>)
Responses	Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
List	pgsql-hackers

Tree view

Hi,

On 2018-03-28 10:23:46 +0800, Craig Ringer wrote:
> TL;DR: Pg should PANIC on fsync() EIO return. Retrying fsync() is not OK at
> least on Linux. When fsync() returns success it means "all writes since the
> last fsync have hit disk" but we assume it means "all writes since the last
> SUCCESSFUL fsync have hit disk".

> But then we retried the checkpoint, which retried the fsync(). The retry
> succeeded, because the prior fsync() *cleared the AS_EIO bad page flag*.

Random other thing we should look at: Some filesystems (nfs yes, xfs
ext4 no) flush writes at close(2). We check close() return code, just
log it... So close() counts as an fsync for such filesystems().

I'm LSF/MM to discuss future behaviour of linux here, but that's how it
is right now.

Greetings,

Andres Freund

pgsql-hackers by date:

From: Robert Haas
Date: 23 April 2018, 23:14:45
Subject: Re: Built-in connection pooling

From: Tom Lane
Date: 24 April 2018, 00:10:20
Subject: "could not reattach to shared memory" on buildfarm member dory

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS - Mailing list pgsql-hackers

Previous

Next