Thread: RFC: Add 'taint' field to pg_control.

RFC: Add 'taint' field to pg_control.

From

Andres Freund

Date:

01 March 2018, 00:43:11

Hi,

a significant number of times during investigations of bugs I wondered
whether running the cluster with various settings, or various tools
could've caused the issue at hand.  Therefore I'd like to propose adding
a 'tainted' field to pg_control, that contains some of the "history" of
the cluster. Individual bits inside that field that I can think of right
now are:
- pg_resetxlog was used non-passively on cluster
- ran with fsync=off
- ran with full_page_writes=off
- pg_upgrade was used

What do others think?

Perhaps we should rename 'tainted' to something less "blame-y" sounding,
but I can't quite think of something more descriptive than
'cluster_history'.

Greetings,

Andres Freund

Re: RFC: Add 'taint' field to pg_control.

From

Tomas Vondra

Date:

01 March 2018, 01:13:44

On 02/28/2018 10:43 PM, Andres Freund wrote:
> Hi,
> 
> a significant number of times during investigations of bugs I wondered
> whether running the cluster with various settings, or various tools
> could've caused the issue at hand.  Therefore I'd like to propose adding
> a 'tainted' field to pg_control, that contains some of the "history" of
> the cluster. Individual bits inside that field that I can think of right
> now are:
> - pg_resetxlog was used non-passively on cluster
> - ran with fsync=off
> - ran with full_page_writes=off
> - pg_upgrade was used
> 
> What do others think?
> 

Isn't the uncertainty usually that you know one of these things happened
on the cluster, but you don't know if that's the actual root cause?
That's my experience, at least.

That being said, I'm not strongly opposed to adding such field.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: RFC: Add 'taint' field to pg_control.

From

Peter Eisentraut

Date:

01 March 2018, 01:14:18

On 2/28/18 16:43, Andres Freund wrote:
> a significant number of times during investigations of bugs I wondered
> whether running the cluster with various settings, or various tools
> could've caused the issue at hand.  Therefore I'd like to propose adding
> a 'tainted' field to pg_control, that contains some of the "history" of
> the cluster. Individual bits inside that field that I can think of right
> now are:
> - pg_resetxlog was used non-passively on cluster
> - ran with fsync=off
> - ran with full_page_writes=off
> - pg_upgrade was used

I can see why you'd want that, but as a DBA, I don't necessarily want
all of that recorded, especially in a quasi-permanent way.

Maybe this could be stored in a separate location, like a small
text/log-like file in the data directory.  Then DBAs can save or keep or
remove that information as they wish.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: RFC: Add 'taint' field to pg_control.

From

Andres Freund

Date:

01 March 2018, 01:16:23

On 2018-02-28 23:13:44 +0100, Tomas Vondra wrote:
> 
> On 02/28/2018 10:43 PM, Andres Freund wrote:
> > Hi,
> > 
> > a significant number of times during investigations of bugs I wondered
> > whether running the cluster with various settings, or various tools
> > could've caused the issue at hand.  Therefore I'd like to propose adding
> > a 'tainted' field to pg_control, that contains some of the "history" of
> > the cluster. Individual bits inside that field that I can think of right
> > now are:
> > - pg_resetxlog was used non-passively on cluster
> > - ran with fsync=off
> > - ran with full_page_writes=off
> > - pg_upgrade was used
> > 
> > What do others think?

> Isn't the uncertainty usually that you know one of these things happened
> on the cluster, but you don't know if that's the actual root cause?
> That's my experience, at least.

My experience is that you get a bugreport and you have no clue what
happened with the cluster. Often the "owner" doesn't have either. Then
you go through various theories and end up blaming it on one of these,
even though it's quite possibly never happened.

Of course this does *not* solve the issue when these actually
occurred. It's just a tool to narrow down possible causes / exclude
others.

Greetings,

Andres Freund

Re: RFC: Add 'taint' field to pg_control.

From

Justin Pryzby

Date:

01 March 2018, 01:16:53

On Wed, Feb 28, 2018 at 01:43:11PM -0800, Andres Freund wrote:
> a significant number of times during investigations of bugs I wondered
> whether running the cluster with various settings, or various tools
> could've caused the issue at hand.  Therefore I'd like to propose adding
> a 'tainted' field to pg_control, that contains some of the "history" of
> the cluster. Individual bits inside that field that I can think of right
> now are:
> - pg_resetxlog was used non-passively on cluster
> - ran with fsync=off
> - ran with full_page_writes=off
> - pg_upgrade was used

What about:

 - pg_control versions used on this cluster (hopefully a full list..obviously
   not going back before PG11);
 - did recovery (you could use "needed recovery" instead, but then there's the
   question of how reliable that field would be);
   + or: timestamp of most recent recovery (attempt?)
 - index corruption detected (on system table?);  Note that "CREATE IF NOT
   EXIST" doesn't avoid unique key errors on system tables, so it's not useful
   to log every ERROR on system tables.
 - page %u is uninitialized --- fixing
 - here's one I just dug up: ERROR: missing chunk number 0 for toast value 10270298 in pg_toast_2619 
 - XID wraparound?
 - autovacuum disabled?
 - checksum failue (in system relation?)
 - local_preload_libraries?
 - started in single user mode or with system indices disabled?
 - hit assertion or PANIC ??
 - UPDATE/DELETE/INSERT on system table ? (or maintenance command like
   vacuum/analyze/cluster/reindex?)

Justin

Re: RFC: Add 'taint' field to pg_control.

From

Andres Freund

Date:

01 March 2018, 01:18:12

Hi,

On 2018-02-28 17:14:18 -0500, Peter Eisentraut wrote:
> I can see why you'd want that, but as a DBA, I don't necessarily want
> all of that recorded, especially in a quasi-permanent way.

Huh?  You're arguing that we should make it easier for DBAs to hide
potential causes of corruption? I fail to see when that's necessary /
desirable? That a cluster ran with fsync=off isn't an embarassing thing,
nor does it contain private data...

Greetings,

Andres Freund

Re: RFC: Add 'taint' field to pg_control.

From

Andres Freund

Date:

01 March 2018, 01:23:19

Hi,

On 2018-02-28 16:16:53 -0600, Justin Pryzby wrote:

Unfortunately your list seems to raise the bar to a place I don't see us
going soon :(


>  - pg_control versions used on this cluster (hopefully a full list..obviously
>    not going back before PG11);

That needs arbitrary much space, that's not feasible for
pg_control. Needs to be <= 512 bytes


>  - did recovery (you could use "needed recovery" instead, but then there's the
>    question of how reliable that field would be);
>    + or: timestamp of most recent recovery (attempt?)

What'd that be useful for?


>  - index corruption detected (on system table?);  Note that "CREATE IF NOT
>    EXIST" doesn't avoid unique key errors on system tables, so it's not useful
>    to log every ERROR on system tables.

I don't think we have a easy way to diagnose this specifically enough


>  - page %u is uninitialized --- fixing

Doesn't really indicate a bug / problem.


>  - here's one I just dug up: ERROR: missing chunk number 0 for toast value 10270298 in pg_toast_2619 

Hm.

>  - XID wraparound?

why?


>  - autovacuum disabled?

why?


>  - checksum failue (in system relation?)

Hm.


>  - local_preload_libraries?

Hm?


>  - started in single user mode or with system indices disabled?

why?


>  - hit assertion or PANIC ??

We certainly don't write to persistent data in those case.


>  - UPDATE/DELETE/INSERT on system table ? (or maintenance command like
>    vacuum/analyze/cluster/reindex?)

Hm, the former I could see some use for, but it might end up being
pretty ugly locking and layering wise.

Greetings,

Andres Freund

Re: RFC: Add 'taint' field to pg_control.

From

Justin Pryzby

Date:

01 March 2018, 01:28:24

On Wed, Feb 28, 2018 at 02:18:12PM -0800, Andres Freund wrote:
> On 2018-02-28 17:14:18 -0500, Peter Eisentraut wrote:
> > I can see why you'd want that, but as a DBA, I don't necessarily want
> > all of that recorded, especially in a quasi-permanent way.
> 
> Huh?  You're arguing that we should make it easier for DBAs to hide
> potential causes of corruption? I fail to see when that's necessary /
> desirable? That a cluster ran with fsync=off isn't an embarassing thing,
> nor does it contain private data...

The more fine grained these are the more useful they can be:

Running with fsync=off is common advice while loading, so reporting that
"fsync=off at some point" is much less useful than reporting "having to run
recovery from an fsync=off database".

I propose there could be two bits:

 1. fsync was off at some point in history when the DB needed recovery;
 2. fsync was off during the current instance; I assume the bit would be set
    whenever fsync=off was set, but could be cleared when the server was
    cleanly shut down.  If the bit is set during recovery, the first bit would
    be permanently set.  Not sure but if this bit is set and then fsync
    re-enabled, maybe this could be cleared after any checkpoint and not just
    shutdown ?

Justin

Re: RFC: Add 'taint' field to pg_control.

From

Justin Pryzby

Date:

01 March 2018, 04:00:54

On Wed, Feb 28, 2018 at 02:23:19PM -0800, Andres Freund wrote:
> Hi,
> 
> On 2018-02-28 16:16:53 -0600, Justin Pryzby wrote:

> >  - did recovery (you could use "needed recovery" instead, but then there's the
> >    question of how reliable that field would be);
> >    + or: timestamp of most recent recovery (attempt?)
> What'd that be useful for?

Theoretically nothing but conceivably useful if there's an issue with recovery.
I recall various historic things weren't but should have been WAL logged.

> >  - local_preload_libraries?
> Hm?

Not sure; but in any case I meant *_preload_libraries.

> >  - started in single user mode or with system indices disabled?
> why?

Some of these I suggested just as a datapoint (or other brainstorms I couldn't
immediately reject).  A cluster where someone has UPDATED pg_* (even
pg_statistic) or otherwise hacked on I would tend to think about differently
than a "pristine" cluster that's never seen anything more interesting than a
join.

Justin

Re: RFC: Add 'taint' field to pg_control.

From

Craig Ringer

Date:

01 March 2018, 04:03:30

On 1 March 2018 at 05:43, Andres Freund <andres@anarazel.de> wrote:

Hi,

a significant number of times during investigations of bugs I wondered
whether running the cluster with various settings, or various tools
could've caused the issue at hand. Therefore I'd like to propose adding
a 'tainted' field to pg_control, that contains some of the "history" of
the cluster. Individual bits inside that field that I can think of right
now are:
- pg_resetxlog was used non-passively on cluster
- ran with fsync=off
- ran with full_page_writes=off
- pg_upgrade was used

What do others think?

A huge +1 from me for the idea. I can't even count the number of black box "WTF did you DO?!?" servers I've looked at, where bizarre behaviour has turned out to be down to the user doing something very silly and not saying anything about it.

It's only some flags, so putting it in pg_control is arguably somewhat wasteful but so minor as to be of no real concern. And that's probably the best way to make sure it follows the cluster around no matter what backup/restore/copy mechanisms are used and how "clever" they try to be.

What I'd _really_ love would be to blow the scope of this up a bit and turn it into a key-events cluster journal, recording key param switches, recoveries (and lsn ranges), pg_upgrade's, etc. But then we'd run into people with weird workloads who managed to make it some massive file, we'd have to make sure we had a way to stop it getting left out of copies/backups, and it'd generally be irritating. So lets not do that. Proper support for class-based logging and multiple outputs would be a good solution for this at some future point.

What you propose is simple enough to be quick to implement, adds no admin overhead, and will be plenty useful enough.

I'd like to add "postmaster.pid was absent when the cluster started" to this list, please. Sure, it's not conclusive, and there are legit reasons why that might be the case, but so often it's somebody kill -9'ing the postmaster, then removing the postmaster.pid and starting up again without killing the workers....

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: RFC: Add 'taint' field to pg_control.

From

Craig Ringer

Date:

01 March 2018, 04:12:18

On 1 March 2018 at 06:28, Justin Pryzby <pryzby@telsasoft.com> wrote:

On Wed, Feb 28, 2018 at 02:18:12PM -0800, Andres Freund wrote:
> On 2018-02-28 17:14:18 -0500, Peter Eisentraut wrote:
> > I can see why you'd want that, but as a DBA, I don't necessarily want
> > all of that recorded, especially in a quasi-permanent way.
>
> Huh? You're arguing that we should make it easier for DBAs to hide
> potential causes of corruption? I fail to see when that's necessary /
> desirable? That a cluster ran with fsync=off isn't an embarassing thing,
> nor does it contain private data...

The more fine grained these are the more useful they can be:

Running with fsync=off is common advice while loading, so reporting that
"fsync=off at some point" is much less useful than reporting "having to run
recovery from an fsync=off database".

I propose there could be two bits:

1. fsync was off at some point in history when the DB needed recovery;
2. fsync was off during the current instance; I assume the bit would be set
whenever fsync=off was set, but could be cleared when the server was
cleanly shut down. If the bit is set during recovery, the first bit would
be permanently set. Not sure but if this bit is set and then fsync
re-enabled, maybe this could be cleared after any checkpoint and not just
shutdown ?

I think that's a very useful way to eliminate false positives and make this diagnostic field more useful. But we'd need to guarantee that when you've switched from fsync=off to fsync=on, we've actually fsync'd every extent of every table and index, whether or not we think they're currently dirty and whether or not the smgr currently has them open. Do something like initdb does to flush everything. Without that, we could have WAL-vs-heap ordering issues and so on *even after a Pg restart* if the system has lots of dirty writeback to do.

A quick look at CheckPointGuts and CheckPointBuffers suggests we don't presently do that but I admit I haven't read in depth.

We force a fsync of the current WAL segment when switching fsync on, but don't do anything else AFAICS.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: RFC: Add 'taint' field to pg_control.

From

Craig Ringer

Date:

01 March 2018, 04:14:16

On 1 March 2018 at 09:00, Justin Pryzby <pryzby@telsasoft.com> wrote:

> > - started in single user mode or with system indices disabled?
> why?

Some of these I suggested just as a datapoint (or other brainstorms I couldn't
immediately reject). A cluster where someone has UPDATED pg_* (even
pg_statistic) or otherwise hacked on I would tend to think about differently
than a "pristine" cluster that's never seen anything more interesting than a
join.

How about "has run in a state where system catalog modifications are permitted". Because I've had one seriously frustrating case where that would've been extremely pertinent information.

That said, I'm not convinced it's worth the effort and I think this is going off into the weeds a little. Lets focus on Andres's core suggestion (and maybe refine the fsync part to get rid of false positives due to bulk loading) and build on from there in subsequent work.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: RFC: Add 'taint' field to pg_control.

From

Justin Pryzby

Date:

02 March 2018, 07:13:00

On Thu, Mar 01, 2018 at 09:12:18AM +0800, Craig Ringer wrote:
> On 1 March 2018 at 06:28, Justin Pryzby <pryzby@telsasoft.com> wrote:
> > The more fine grained these are the more useful they can be:
> >
> > Running with fsync=off is common advice while loading, so reporting that
> > "fsync=off at some point" is much less useful than reporting "having to run
> > recovery from an fsync=off database".
> >
> > I propose there could be two bits:
> >
> >  1. fsync was off at some point in history when the DB needed recovery;
> >  2. fsync was off during the current instance; I assume the bit would be set
> >     whenever fsync=off was set, but could be cleared when the server was
> >     cleanly shut down.  If the bit is set during recovery, the first bit would
> >     be permanently set.  Not sure but if this bit is set and then fsync
> >     re-enabled, maybe this could be cleared after any checkpoint and not just
> >     shutdown ?
> >
> I think that's a very useful way to eliminate false positives and make this
> diagnostic field more useful. But we'd need to guarantee that when you've
> switched from fsync=off to fsync=on, we've actually fsync'd every extent of
> every table and index, whether or not we think they're currently dirty and
> whether or not the smgr currently has them open. Do something like initdb
> does to flush everything. Without that, we could have WAL-vs-heap ordering
> issues and so on *even after a Pg restart* if the system has lots of dirty
> writeback to do.


I think a similar, "2 bit" scheme *could* be applied to full_page_writes, too.
And the 2nd "soft" bit for FPW could also be cleared after checkpoint.  I'd
think that for FPW, previous checkpoints which had fsync=off wouldn't need
special handling - so long as FPW=on, the "soft" bit can be cleared after
successful checkpoint (as a micro-optimization, don't clear the "soft" bit if
the "hard" bit is set).

BTW this scheme has the amusing consequence that setting fsync=off should
involve first writing+fsyncing these bits to pgcontrol.

Justin

Re: RFC: Add 'taint' field to pg_control.

From

Robert Haas

Date:

07 March 2018, 23:58:38

On Wed, Feb 28, 2018 at 8:03 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> A huge +1 from me for the idea. I can't even count the number of black box
> "WTF did you DO?!?" servers I've looked at, where bizarre behaviour has
> turned out to be down to the user doing something very silly and not saying
> anything about it.

+1 from me, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: RFC: Add 'taint' field to pg_control.

From

Craig Ringer

Date:

08 March 2018, 04:51:29

On 8 March 2018 at 04:58, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Feb 28, 2018 at 8:03 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> A huge +1 from me for the idea. I can't even count the number of black box
> "WTF did you DO?!?" servers I've looked at, where bizarre behaviour has
> turned out to be down to the user doing something very silly and not saying
> anything about it.

+1 from me, too.

My favourite remains an organisation that kept "fixing" an issue by kill -9'ing the postmaster and removing postmaster.pid to make it start up again. Without killing all the leftover backends. Of course, the system kept getting more unstable and broken, so they did it more and more often. They were working on scripting it when they gave up and asked for help.

The data recovery effort on that one was truly exciting. I remember looking at bash history and having to take a short break to figure out how on earth to communicate what was going on.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: RFC: Add 'taint' field to pg_control.

From

Andres Freund

Date:

08 March 2018, 05:18:28


On March 7, 2018 5:51:29 PM PST, Craig Ringer <craig@2ndquadrant.com> wrote:
>My favourite remains an organisation that kept "fixing" an issue by
>kill
>-9'ing the postmaster and removing postmaster.pid to make it start up
>again. Without killing all the leftover backends. Of course, the system
>kept getting more unstable and broken, so they did it more and more
>often.
>They were working on scripting it when they gave up and asked for help.

Maybe I'm missing something, but that ought to not work. The shmem segment that we keep around would be a conflict, no?

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: RFC: Add 'taint' field to pg_control.

From

Craig Ringer

Date:

08 March 2018, 07:06:03

On 8 March 2018 at 10:18, Andres Freund <andres@anarazel.de> wrote:

On March 7, 2018 5:51:29 PM PST, Craig Ringer <craig@2ndquadrant.com> wrote:
>My favourite remains an organisation that kept "fixing" an issue by
>kill
>-9'ing the postmaster and removing postmaster.pid to make it start up
>again. Without killing all the leftover backends. Of course, the system
>kept getting more unstable and broken, so they did it more and more
>often.
>They were working on scripting it when they gave up and asked for help.

Maybe I'm missing something, but that ought to not work. The shmem segment that we keep around would be a conflict, no?

As I understand it, because we allow multiple Pg instances on a system, we identify the small sysv shmem segment we use by the postmaster's pid. If you remove the DirLockFile (postmaster.pid) you remove the interlock against starting a new postmaster. It'll think it's a new independent instance on the same host, make a new shmem segment and go merrily on its way mangling data horribly.

See CreateLockFile(). Also 7e2a18a9161 . In particular src/backend/utils/init/miscinit.c +938,

if (isDDLock)

{

....

if (PGSharedMemoryIsInUse(id1, id2))

ereport(FATAL,

(errcode(ERRCODE_LOCK_FILE_EXISTS),

errmsg("pre-existing shared memory block "

"(key %lu, ID %lu) is still in use",

id1, id2),

errhint("If you're sure there are no old "

"server processes still running, remove "

"the shared memory block "

"or just delete the file \"%s\".",

filename)));

....

}

I still think that error is a bit optimistic, and should really say "make very sure there are no 'postgres' processes associated with this data directory, then ...'

It'd be nice if the OS offered us some support here. Something like opening a lockfile in exclusive lock mode, then inheriting the FD and lock on all children, with each child inheriting the lock. So the exclusive lock wouldn't get released until all FDs associated with it are released. But AFAIK nothing like that is present, let alone portable.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: RFC: Add 'taint' field to pg_control.

From

Tom Lane

Date:

08 March 2018, 07:34:37

Craig Ringer <craig@2ndquadrant.com> writes:
> As I understand it, because we allow multiple Pg instances on a system, we
> identify the small sysv shmem segment we use by the postmaster's pid. If
> you remove the DirLockFile (postmaster.pid) you remove the interlock
> against starting a new postmaster. It'll think it's a new independent
> instance on the same host, make a new shmem segment and go merrily on its
> way mangling data horribly.

Yeah.  If we realized that the old shmem segment was associated with this
data directory, we could check for processes still attached to it ... but
the lock file is exactly where that association is kept.

> It'd be nice if the OS offered us some support here. Something like opening
> a lockfile in exclusive lock mode, then inheriting the FD and lock on all
> children, with each child inheriting the lock. So the exclusive lock
> wouldn't get released until all FDs associated with it are released. But
> AFAIK nothing like that is present, let alone portable.

I've wondered about that too.  An inheritable lock on the data directory
itself would be ideal, but that doesn't exist anywhere AFAIK.  We could
imagine taking a BSD-style-flock(2) lock on some lock file, on systems that
have that; but not all do, so it couldn't be our only protection.  Another
problem is that since it'd have to be an ordinary file, there'd still be a
risk of cowboy DBAs removing the lock file.  Maybe we could use pg_control
as the lock file?

            regards, tom lane

Re: RFC: Add 'taint' field to pg_control.

From

Craig Ringer

Date:

08 March 2018, 10:02:12

On 8 March 2018 at 12:34, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Craig Ringer <craig@2ndquadrant.com> writes:
> As I understand it, because we allow multiple Pg instances on a system, we
> identify the small sysv shmem segment we use by the postmaster's pid. If
> you remove the DirLockFile (postmaster.pid) you remove the interlock
> against starting a new postmaster. It'll think it's a new independent
> instance on the same host, make a new shmem segment and go merrily on its
> way mangling data horribly.

Yeah. If we realized that the old shmem segment was associated with this
data directory, we could check for processes still attached to it ... but
the lock file is exactly where that association is kept.

> It'd be nice if the OS offered us some support here. Something like opening
> a lockfile in exclusive lock mode, then inheriting the FD and lock on all
> children, with each child inheriting the lock. So the exclusive lock
> wouldn't get released until all FDs associated with it are released. But
> AFAIK nothing like that is present, let alone portable.

I've wondered about that too. An inheritable lock on the data directory
itself would be ideal, but that doesn't exist anywhere AFAIK. We could
imagine taking a BSD-style-flock(2) lock on some lock file, on systems that
have that; but not all do, so it couldn't be our only protection. Another
problem is that since it'd have to be an ordinary file, there'd still be a
risk of cowboy DBAs removing the lock file. Maybe we could use pg_control
as the lock file?

"The error mentioned that there was another active database using this pg_control file, so I deleted it, and now my database won't start."

Like our discussion about pg_resetxlog, there's a limit to how much operator error we can guard against. More importantly we'd have to be _very_ sure it was correct or our attempts to protect the user would risk creating a major outage.

That said, flock(2) with LOCK_NB seems to have just the semantics we want:

A single file may not simultaneously have both shared and exclusive locks.

Locks created by flock() are associated with an open file description (see open(2)). This means that duplicate file descriptors (created by, for example, fork(2) or dup(2)) refer to the same lock,

and this lock may be modified or released using any of these file descriptors. Furthermore, the lock is released either by an explicit LOCK_UN operation on any of these duplicate file descriptors,

or when all such file descriptors have been closed.

Worth looking into, but really important not to make a mistake and make things worse.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: RFC: Add 'taint' field to pg_control.

From

Andres Freund

Date:

08 March 2018, 23:26:53

On 2018-03-07 23:34:37 -0500, Tom Lane wrote:
> Craig Ringer <craig@2ndquadrant.com> writes:
> > As I understand it, because we allow multiple Pg instances on a system, we
> > identify the small sysv shmem segment we use by the postmaster's pid. If
> > you remove the DirLockFile (postmaster.pid) you remove the interlock
> > against starting a new postmaster. It'll think it's a new independent
> > instance on the same host, make a new shmem segment and go merrily on its
> > way mangling data horribly.
> 
> Yeah.  If we realized that the old shmem segment was associated with this
> data directory, we could check for processes still attached to it ... but
> the lock file is exactly where that association is kept.

I'd somehow remembered that we just took the path as the identifier, but
that's wrong...

Greetings,

Andres Freund

Re: RFC: Add 'taint' field to pg_control.

From

Justin Pryzby

Date:

23 March 2020, 18:21:43

On Wed, Feb 28, 2018 at 04:16:53PM -0600, Justin Pryzby wrote:
> On Wed, Feb 28, 2018 at 01:43:11PM -0800, Andres Freund wrote:
> > a significant number of times during investigations of bugs I wondered
> > whether running the cluster with various settings, or various tools
> > could've caused the issue at hand.  Therefore I'd like to propose adding
> > a 'tainted' field to pg_control, that contains some of the "history" of
> > the cluster. Individual bits inside that field that I can think of right
> > now are:
> > - pg_resetxlog was used non-passively on cluster
> > - ran with fsync=off
> > - ran with full_page_writes=off
> > - pg_upgrade was used

+ cluster was downgraded (between minor releases)

Dregging up a years old thread since I thought that was worth adding..

-- 
Justin