Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS - Mailing list pgsql-hackers

From Anthony Iliopoulos
Subject Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Date
Msg-id 20180409123126.GB4233@ai-wks
Whole thread Raw
In response to Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS  (Geoff Winkless <pgsqladmin@geoff.dj>)
Responses Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
List pgsql-hackers
On Mon, Apr 09, 2018 at 01:03:28PM +0100, Geoff Winkless wrote:
> On 9 April 2018 at 11:50, Anthony Iliopoulos <ailiop@altatus.com> wrote:
> 
> > What you seem to be asking for is the capability of dropping
> > buffers over the (kernel) fence and idemnifying the application
> > from any further responsibility, i.e. a hard assurance
> > that either the kernel will persist the pages or it will
> > keep them around till the application recovers them
> > asynchronously, the filesystem is unmounted, or the system
> > is rebooted.
> >
> 
> That seems like a perfectly reasonable position to take, frankly.

Indeed, as long as you are willing to ignore the consequences of
this design decision: mainly, how you would recover memory when no
application is interested in clearing the error. At which point
other applications with different priorities will find this position
rather unreasonable since there can be no way out of it for them.
Good luck convincing any OS kernel upstream to go with this design.

> The whole _point_ of an Operating System should be that you can do exactly
> that. As a developer I should be able to call write() and fsync() and know
> that if both calls have succeeded then the result is on disk, no matter
> what another application has done in the meantime. If that's a "difficult"
> problem then that's the OS's problem, not mine. If the OS doesn't do that,
> it's _not_doing_its_job_.

No OS kernel that I know of provides any promises for atomicity of a
write()+fsync() sequence, unless one is using O_SYNC. It doesn't
provide you with isolation either, as this is delegated to userspace,
where processes that share a file should coordinate accordingly.

It's not a difficult problem, but rather the kernels provide a common
denominator of possible interfaces and designs that could accommodate
a wider range of potential application scenarios for which the kernel
cannot possibly anticipate requirements. There have been plenty of
experimental works for providing a transactional (ACID) filesystem
interface to applications. On the opposite end, there have been quite
a few commercial databases that completely bypass the kernel storage
stack. But I would assume it is reasonable to figure out something
between those two extremes that can work in a "portable" fashion.

Best regards,
Anthony


pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: pgsql: Store 2PC GID in commit/abort WAL recs for logicaldecoding
Next
From: Anthony Iliopoulos
Date:
Subject: Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS