Excerpts from Florian Pflug's message of sáb may 19 03:48:51 -0400 2012:
>
> On May18, 2012, at 23:18 , Alvaro Herrera wrote:
> > Excerpts from Florian Pflug's message of jue may 17 09:08:26 -0400 2012:
> > Seems to me that we could make zero_damaged_pages an enum. The default
> > value of "on" would only catch truncated-away pages; another value would
> > also capture kernel-level error conditions.
>
> Yeah, an enum would be nicer than an additional GUC. I kinda keep forgetting
> that we have those. Though to bikeshed, the GUC should probably be just called
> 'zero_pages' and take the values 'never', 'missing', 'unreadable' ;-)
Sounds reasonable to me ..
> > The thing is, once you start getting kernel-level errors you're pretty
> > much screwed and there's no way to just recover whatever data is
> > recoverable.
>
> I thought your initial gripe was precisely that you got a kernel-level error,
> yet the filesystem was still in pretty good shape?
Uhm. I'm not really sure what's the actual problem, but I think it is
precisely a corrupted filesystem.
> Which actually seemed quite likely to me - the cause could be, for example,
> simply a single bad block. Or a filesystem-level checksum error if you're using
> a filesystem with built-in integrity checks.
I guess ERANGE is the sort of thing that's not quite expected here -- I
mean you might get EIO if there's an I/O problem such as a checksum
error, but ERANGE suggests to me that the kernel might be leaking some
internal error that's not supposed to be thrown to the user.
In any case I don't think we can distinguish kernel-level problems such
as this one, from filesystem level problems. I mean, they all come from
the kernel, as far as we're concerned.
--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support