Home > mailing lists

Re: Idea for improving buildfarm robustness - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: Idea for improving buildfarm robustness
Date	September 29, 2015 22:47:52
Msg-id	1555.1443556067@sss.pgh.pa.us Whole thread Raw
In response to	Re: Idea for improving buildfarm robustness (Josh Berkus <josh@agliodbs.com>)
Responses	Re: Idea for improving buildfarm robustness (Alvaro Herrera <alvherre@2ndquadrant.com>) Re: Idea for improving buildfarm robustness (Joe Conway <mail@joeconway.com>)
List	pgsql-hackers

Tree view

Josh Berkus <josh@agliodbs.com> writes:
> On 09/29/2015 11:48 AM, Tom Lane wrote:
>> But today I thought of another way: suppose that we teach the postmaster
>> to commit hara-kiri if the $PGDATA directory goes away.  Since the
>> buildfarm script definitely does remove all the temporary data directories
>> it creates, this ought to get the job done.

> This would also be useful for production.  I can't count the number of
> times I've accidentally blown away a replica's PGDATA without shutting
> the postmaster down first, and then had to do a bunch of kill -9.

> In general, having the postmaster survive deletion of PGDATA is
> suboptimal.  In rare cases of having it survive installation of a new
> PGDATA (via PITR restore, for example), I've even seen the zombie
> postmaster corrupt the data files.

Side comment on that: if you'd actually removed $PGDATA, I can't see how
that would happen.  The postmaster and children would have open CWD
handles to the now-disconnected-from-anything-else directory inode,
which would not enable them to reach files created under the new directory
inode.  (They don't ever use absolute paths, only relative, or at least
that's the way it's supposed to work.)

However ... if you'd simply deleted everything *under* $PGDATA but not
that directory itself, then this type of failure mode is 100% plausible.
And that's not an unreasonable thing to do, especially if you've set
things up so that $PGDATA's parent is not a writable directory.

Testing accessibility of "global/pg_control" would be enough to catch this
case, but only if we do it before you create a new one.  So that seems
like an argument for making the test relatively often.  The once-a-minute
option is sounding better and better.

We could possibly add additional checks, like trying to verify that
pg_control has the same inode number it used to.  But I'm afraid that
would add portability issues and false-positive hazards that would
outweigh the value.
        regards, tom lane

pgsql-hackers by date:

From: Andres Freund
Date: 29 September 2015, 22:35:24
Subject: Re: ON CONFLICT issues around whole row vars,

From: Stephen Frost
Date: 29 September 2015, 22:49:36
Subject: Re: ON CONFLICT issues around whole row vars,

Re: Idea for improving buildfarm robustness - Mailing list pgsql-hackers

Previous

Next