Re: PG in container w/ pid namespace is init, process exits cause restart - Mailing list pgsql-hackers

From Andres Freund
Subject Re: PG in container w/ pid namespace is init, process exits cause restart
Date
Msg-id 20210503201234.atro24bdnzcybqiw@alap3.anarazel.de
Whole thread Raw
In response to Re: PG in container w/ pid namespace is init, process exits cause restart  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: PG in container w/ pid namespace is init, process exits cause restart
List pgsql-hackers
Hi,

On 2021-05-03 15:37:24 -0400, Tom Lane wrote:
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> > On 2021-May-03, Andres Freund wrote:
> >> The issue turns out to be that postgres was in a container, with pid
> >> namespaces enabled. Because postgres was run directly in the container,
> >> without a parent process inside, it thus becomes pid 1. Which mostly
> >> works without a problem. Until, as the case here with the archive
> >> command, a sub-sub process exits while it still has a child. Then that
> >> child gets re-parented to postmaster (as init).
>
> > Hah .. interesting.  I think we should definitely make this work, since
> > containerized stuff is going to become more and more prevalent.
>
> How would we make it "work"?  The postmaster can't possibly be expected
> to know the right thing to do with unexpected children.

Not saying that we should, but we could check if we're pid 1 / init, and
just warn about children we don't know anything about. Which we could
detect by iterating over BackendList/BackgroundWorkerList before
crash-restarting in CleanupBackend().  Then we'd not loose reliability
in the "normal" case, while not reducing reliability in the container
case.

I'm not quite sure I buy the reliability argument, tbh: The additional
process exits we see as pid 1 are after all process exits that we'd not
see if we weren't pid 1. And if we're not pid 1 then there really should
never be any "unexpected children" - we know what processes postmaster
itself forked after all. So where would unexpected children come from,
except reparenting?


> And who's to say that ignoring unexpected child deaths is okay,
> anyway?  We could hardly be sure that the dead process hadn't been
> connected to shared memory.

I don't think checking the exit status of unexpected children to see
whether we should crash-restart out of that concern is meaningful: We
don't know that the child didn't do anything bad with shared memory when
they exited with exit(1), instead of exit(2).


Random thought: I wonder if we ought to set madvise(MADV_DONTFORK) on
shared memory in postmaster children, where available. Then we could be
fairly certain that there aren't processes we don't know about that are
attached to shared memory (unless there's some nasty
shared_preload_library forking early during backend startup - but that's
hard to get excited about).


> > I guess we can do that in older releases, but do we really need it?  As
> > I understand, the only thing we need to do is verify that the dying PID
> > is a backend PID, and not cause a crash cycle if it isn't.
>
> I think that'd be a net reduction in reliability, not an improvement.
> In most scenarios it'd do little except mask bugs.

Do you feel the same about having different logging between the "known"
and "unknown" child processes?


Personally I don't think it's of utmost importance to support running as
pid 1. But we should at least print useful log messages about what
processes exited...


Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: PG in container w/ pid namespace is init, process exits cause restart
Next
From: Tom Lane
Date:
Subject: Re: PG in container w/ pid namespace is init, process exits cause restart