Home > mailing lists

Re: RFC: Add 'taint' field to pg_control. - Mailing list pgsql-hackers

From	Craig Ringer
Subject	Re: RFC: Add 'taint' field to pg_control.
Date	March 1, 2018 07:03:30
Msg-id	CAMsr+YGJqHDP=HkLxAukhVz0R56MTfEj1++t8M-AWb+xFTwZqA@mail.gmail.com Whole thread Raw
In response to	RFC: Add 'taint' field to pg_control. (Andres Freund <andres@anarazel.de>)
Responses	Re: RFC: Add 'taint' field to pg_control. (Robert Haas <robertmhaas@gmail.com>)
List	pgsql-hackers

Tree view

On 1 March 2018 at 05:43, Andres Freund <andres@anarazel.de> wrote:

Hi,

a significant number of times during investigations of bugs I wondered
whether running the cluster with various settings, or various tools
could've caused the issue at hand. Therefore I'd like to propose adding
a 'tainted' field to pg_control, that contains some of the "history" of
the cluster. Individual bits inside that field that I can think of right
now are:
- pg_resetxlog was used non-passively on cluster
- ran with fsync=off
- ran with full_page_writes=off
- pg_upgrade was used

What do others think?

A huge +1 from me for the idea. I can't even count the number of black box "WTF did you DO?!?" servers I've looked at, where bizarre behaviour has turned out to be down to the user doing something very silly and not saying anything about it.

It's only some flags, so putting it in pg_control is arguably somewhat wasteful but so minor as to be of no real concern. And that's probably the best way to make sure it follows the cluster around no matter what backup/restore/copy mechanisms are used and how "clever" they try to be.

What I'd _really_ love would be to blow the scope of this up a bit and turn it into a key-events cluster journal, recording key param switches, recoveries (and lsn ranges), pg_upgrade's, etc. But then we'd run into people with weird workloads who managed to make it some massive file, we'd have to make sure we had a way to stop it getting left out of copies/backups, and it'd generally be irritating. So lets not do that. Proper support for class-based logging and multiple outputs would be a good solution for this at some future point.

What you propose is simple enough to be quick to implement, adds no admin overhead, and will be plenty useful enough.

I'd like to add "postmaster.pid was absent when the cluster started" to this list, please. Sure, it's not conclusive, and there are legit reasons why that might be the case, but so often it's somebody kill -9'ing the postmaster, then removing the postmaster.pid and starting up again without killing the workers....

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

From: Daniel Gustafsson
Date: 01 March 2018, 07:00:56
Subject: Re: Two small patches for the isolationtester lexer

From: Craig Ringer
Date: 01 March 2018, 07:12:18
Subject: Re: RFC: Add 'taint' field to pg_control.

Re: RFC: Add 'taint' field to pg_control. - Mailing list pgsql-hackers

Previous

Next