On Mon, Apr 23, 2018 at 01:14:48PM -0700, Andres Freund wrote:
> Hi,
>
> On 2018-03-28 10:23:46 +0800, Craig Ringer wrote:
> > TL;DR: Pg should PANIC on fsync() EIO return. Retrying fsync() is not OK at
> > least on Linux. When fsync() returns success it means "all writes since the
> > last fsync have hit disk" but we assume it means "all writes since the last
> > SUCCESSFUL fsync have hit disk".
>
> > But then we retried the checkpoint, which retried the fsync(). The retry
> > succeeded, because the prior fsync() *cleared the AS_EIO bad page flag*.
>
> Random other thing we should look at: Some filesystems (nfs yes, xfs
> ext4 no) flush writes at close(2). We check close() return code, just
> log it... So close() counts as an fsync for such filesystems().
Well, that's interesting. You might remember that NFS does not reserve
space for writes like local file systems like ext4/xfs do. For that
reason, we might be able to capture the out-of-space error on close and
exit sooner for NFS.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +