On Thu, Mar 29, 2018 at 6:58 PM, Craig Ringer <craig@2ndquadrant.com> wrote: > On 28 March 2018 at 11:53, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> >> Craig Ringer <craig@2ndquadrant.com> writes: >> > TL;DR: Pg should PANIC on fsync() EIO return. >> >> Surely you jest. > > No. I'm quite serious. Worse, we quite possibly have to do it for ENOSPC as > well to avoid similar lost-page-write issues.
I found your discussion with kernel hacker Jeff Layton at https://lwn.net/Articles/718734/ in which he said: "The stackoverflow writeup seems to want a scheme where pages stay dirty after a writeback failure so that we can try to fsync them again. Note that that has never been the case in Linux after hard writeback failures, AFAIK, so programs should definitely not assume that behavior."
The article above that says the same thing a couple of different ways, ie that writeback failure leaves you with pages that are neither written to disk successfully nor marked dirty.
If I'm reading various articles correctly, the situation was even worse before his errseq_t stuff landed. That fixed cases of completely unreported writeback failures due to sharing of PG_error for both writeback and read errors with certain filesystems, but it doesn't address the clean pages problem.
Yeah, I see why you want to PANIC.
In more ways than one ;)
> I'm not seeking to defend what the kernel seems to be doing. Rather, saying > that we might see similar behaviour on other platforms, crazy or not. I > haven't looked past linux yet, though.
I see no reason to think that any other operating system would behave that way without strong evidence... This is openly acknowledged to be "a mess" and "a surprise" in the Filesystem Summit article. I am not really qualified to comment, but from a cursory glance at FreeBSD's vfs_bio.c I think it's doing what you'd hope for... see the code near the comment "Failed write, redirty."
Ok, that's reassuring, but doesn't help us on the platform the great majority of users deploy on :(