Re: RFC: Add 'taint' field to pg_control. - Mailing list pgsql-hackers

From Craig Ringer
Subject Re: RFC: Add 'taint' field to pg_control.
Date
Msg-id CAMsr+YFNwKDDe_bRwMnU77EtCgGukAf2CBzD2GqvbO0H84Bbew@mail.gmail.com
Whole thread Raw
In response to Re: RFC: Add 'taint' field to pg_control.  (Justin Pryzby <pryzby@telsasoft.com>)
Responses Re: RFC: Add 'taint' field to pg_control.  (Justin Pryzby <pryzby@telsasoft.com>)
List pgsql-hackers
On 1 March 2018 at 06:28, Justin Pryzby <pryzby@telsasoft.com> wrote:
On Wed, Feb 28, 2018 at 02:18:12PM -0800, Andres Freund wrote:
> On 2018-02-28 17:14:18 -0500, Peter Eisentraut wrote:
> > I can see why you'd want that, but as a DBA, I don't necessarily want
> > all of that recorded, especially in a quasi-permanent way.
>
> Huh?  You're arguing that we should make it easier for DBAs to hide
> potential causes of corruption? I fail to see when that's necessary /
> desirable? That a cluster ran with fsync=off isn't an embarassing thing,
> nor does it contain private data...

The more fine grained these are the more useful they can be:

Running with fsync=off is common advice while loading, so reporting that
"fsync=off at some point" is much less useful than reporting "having to run
recovery from an fsync=off database".

I propose there could be two bits:

 1. fsync was off at some point in history when the DB needed recovery;
 2. fsync was off during the current instance; I assume the bit would be set
    whenever fsync=off was set, but could be cleared when the server was
    cleanly shut down.  If the bit is set during recovery, the first bit would
    be permanently set.  Not sure but if this bit is set and then fsync
    re-enabled, maybe this could be cleared after any checkpoint and not just
    shutdown ?


I think that's a very useful way to eliminate false positives and make this diagnostic field more useful. But we'd need to guarantee that when you've switched from fsync=off to fsync=on, we've actually fsync'd every extent of every table and index, whether or not we think they're currently dirty and whether or not the smgr currently has them open. Do something like initdb does to flush everything. Without that, we could have WAL-vs-heap ordering issues and so on *even after a Pg restart* if the system has lots of dirty writeback to do.

A quick look at CheckPointGuts and CheckPointBuffers suggests we don't presently do that but I admit I haven't read in depth.

We force a fsync of the current WAL segment when switching fsync on, but don't do anything else AFAICS.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: RFC: Add 'taint' field to pg_control.
Next
From: Daniel Gustafsson
Date:
Subject: Re: Online enabling of checksums