Re: Accidental removal of a file causing various problems - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Accidental removal of a file causing various problems
Date
Msg-id 27997.1535139877@sss.pgh.pa.us
Whole thread Raw
In response to Re: Accidental removal of a file causing various problems  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-hackers
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> On 2018-Aug-25, Pavan Deolasee wrote:
>> Now of course, the file is really missing. But the user was quite surprised
>> that they couldn't connect to any database, even though mishap happened to
>> a user table in one of their reporting databases.

> Hmm, that sounds like there's a bunch of dirty pages waiting to be
> written to that nonexistant file, and the error prevents the starting
> backend from acquiring a free page on which to read something from disk
> for another relation.

Perhaps so --- but wouldn't this require that every buffer in shared
buffers now belong to the corrupted file?  Or have we broken the
allocation algorithm such that the same buffer keeps getting handed
out to every request?

I'm starting to wonder if this type of scenario needs to be considered
alongside the truncation corruption issues we're discussing nearby.
What do you do given a persistent failure to write a dirty block?
It's hard to see how you get to an answer that doesn't result in
(a) corrupted data or (b) a stuck database, neither of which is
pleasant.  But I think right now our behavior will lead to (b),
which is what this is reporting --- until you do stop -m immediate,
and then likely you've got (a).

            regards, tom lane


pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Windows vs C99 (was Re: C99 compliance for src/port/snprintf.c)
Next
From: Tom Lane
Date:
Subject: Re: Accidental removal of a file causing various problems