Re: Weird failure with latches in curculio on v15 - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Weird failure with latches in curculio on v15
Date
Msg-id CA+hUKGJPzLinvuPQSirk7gwmjjnH9=C1PG_aZNvzLeXufaHxSw@mail.gmail.com
Whole thread Raw
In response to Re: Weird failure with latches in curculio on v15  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Weird failure with latches in curculio on v15
List pgsql-hackers
On Fri, Feb 3, 2023 at 8:24 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andres Freund <andres@anarazel.de> writes:
> > Ugh, I think I might understand what's happening:
>
> > The signal arrives just after the fork() (within system()). Because we
> > have all our processes configure themselves as process group leaders,
> > and we signal the entire process group (c.f. signal_child()), both the
> > child process and the parent will process the signal. So we'll end up
> > doing a proc_exit() in both. As both are trying to remove themselves
> > from the same PGPROC etc entry, that doesn't end well.
>
> Ugh ...

Yuck, but yeah that makes sense.

> > I don't see how we can solve that properly as long as we use system().
>
> ... but I don't see how that's system()'s fault?  Doing the fork()
> ourselves wouldn't change anything about that.

What if we block signals, fork, then in the child, install the default
SIGTERM handler, then unblock, and then exec the shell?  If SIGTERM is
delivered either before or after exec (but before whatever is loaded
installs a new handler) then the child is terminated, but without
running the handler.  Isn't that what we want here?



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Weird failure with latches in curculio on v15
Next
From: Andres Freund
Date:
Subject: Re: Weird failure with latches in curculio on v15