Re: Accidental removal of a file causing various problems - Mailing list pgsql-hackers

From Pavan Deolasee
Subject Re: Accidental removal of a file causing various problems
Date
Msg-id CABOikdPC=LCZ650F5ka8Bzx3NHaguwv6ZVQe6DByvGV0th83iw@mail.gmail.com
Whole thread Raw
In response to Re: Accidental removal of a file causing various problems  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers


On Sat, Aug 25, 2018 at 1:15 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Actually, I think the main point is given that we've somehow got into
a situation like that, how do we get out again?

I and Alvaro discussed this off-list a bit and we came up with couple of ideas. 

1. Reserve some buffers in the shared buffers for system critical functionality. As this case shows, failure to write blocks populated the entire shared buffers with bad blocks and thus making the database completely inaccessible, even for remedial actions. So the idea is to leave aside say first 100 (or some such number) of blocks for system catalogs and allocate buffers from the remaining pool for user tables. Since will at least help in cases where one bad user table does not bring down the entire cluster. Of course, this may not help if the system catalogs themselves are unwritable. But that's probably a major issue anyways.

2. Provide either an automatic or manual way to evict unwritable buffers to a spillover file or set of files. The buffer pool can then be rescued from the critical situation and the DBA can manually inspect the spillover files to take any corrective action, if needed and if feasible. My idea was to create a shadow relfilenode and write buffers to their logical location. Alvaro though thinks that writing one block per file (relfilenode/fork/block) is a better idea since that provides an easy way for DBA to take action. Irrespective of whether we pick one file per block or per relfilenode, a more interesting question is: should this be automatic or require administrative action?

Does either of the ideas sound interesting enough for further work? 

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

Previous
From: Pavan Deolasee
Date:
Subject: Re: MERGE SQL statement for PG12
Next
From: Dilip Kumar
Date:
Subject: Re: pg_verify_checksums failure with hash indexes