Re: PG in container w/ pid namespace is init, process exits cause restart - Mailing list pgsql-hackers

From Tom Lane
Subject Re: PG in container w/ pid namespace is init, process exits cause restart
Date
Msg-id 3915734.1620152799@sss.pgh.pa.us
Whole thread Raw
In response to Re: PG in container w/ pid namespace is init, process exits cause restart  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: PG in container w/ pid namespace is init, process exits cause restart
List pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, May 3, 2021 at 3:37 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I think that'd be a net reduction in reliability, not an improvement.
>> In most scenarios it'd do little except mask bugs.  And who's to say
>> that ignoring unexpected child deaths is okay, anyway?  We could hardly
>> be sure that the dead process hadn't been connected to shared memory.

> This argument doesn't make any sense to me. In almost all cases,
> postgres is not init, and if a backend forks a child which stomps on
> shared memory and exits, the postmaster will not know that there is a
> problem and will not restart. In practice this is not a problem,
> because the core code is careful not to touch shared memory in
> children that it forks, and extensions written by reasonably smart
> people aren't going to do that either, because it's not very hard to
> figure out that it can't possibly work. So, in the rare case where
> postgres IS init, and it finds out that a descendent process which is
> not a direct child has exited, it should do the same thing that we do
> in all the other cases where a descendent process that is not a direct
> child has exited, viz. nothing. And if that's the wrong idea - I don't
> think it is - then we should fix it in all cases, not just the one
> where postgres is init.

You are arguing from assumptions not in evidence, specifically that
if we reap a PID that isn't one we recognize, this must be what
happened.  I think it's *at least* as likely that the case implies
some bug in the postmaster's child-process bookkeeping, in which
case doing nothing is not a good answer.  (The fact that that's
what we do today doesn't make it right.)  I don't wish to
lobotomize our ability to detect such problems in order to support
incompetently-configured containers.

Independently of that, as was pointed out upthread, being init requires
more than just ignoring unrecognized results from waitpid.  We shouldn't
take on that responsibility when there are perfectly good solutions out
there already.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Mark Dilger
Date:
Subject: Re: Extending amcheck to check toast size and compression
Next
From: Robert Haas
Date:
Subject: Re: Small issues with CREATE TABLE COMPRESSION