Thread: Add guc to enable send SIGSTOP to peers when backend exits abnormally

Add guc to enable send SIGSTOP to peers when backend exits abnormally

From

"蔡梦娟(玊于)"

Date:

03 September 2021, 07:00:00

Hi all

I want to share a patch with you, in which I add a guc parameter 'enable_send_stop' to enable set the value of SendStop in postmaster.c more conveniently. SendStop enable postmaster to send SIGSTOP rather than SIGQUIT to its children when some backend dumps core, and this variable is originally set with -T parameter when start postgres, which is inconvenient to control in some scenarios.

Thanks & Best Regards

Attachment

0001-Add-guc-to-set-SendStop.patch

Re: Add guc to enable send SIGSTOP to peers when backend exits abnormally

From

Tom Lane

Date:

03 September 2021, 14:39:33

"=?UTF-8?B?6JSh5qKm5aifKOeOiuS6jik=?=" <mengjuan.cmj@alibaba-inc.com> writes:
> I want to share a patch with you, in which I add a guc parameter 'enable_send_stop' to enable set the value of
SendStopin postmaster.c more conveniently.  SendStop enable postmaster to send SIGSTOP rather than SIGQUIT to its
childrenwhen some backend dumps core, and this variable is originally set with -T parameter when start postgres, which
isinconvenient to control in some scenarios. 

TBH, I'd sooner rip out SendStop, and simplify the related postmaster
logic.  I've never used it in twenty-some years of Postgres hacking,
and I doubt anyone else has used it much either.  It's not worth the
overhead of a GUC.  (The argument that you need it in situations
where you can't control the postmaster's command line seems pretty
thin, too.  I'm much more worried about somebody turning it on by
accident and then complaining that the cluster freezes upon crash.)

            regards, tom lane

Re: Add guc to enable send SIGSTOP to peers when backend exits abnormally

From

Alvaro Herrera

Date:

03 September 2021, 21:16:06

On 2021-Sep-03, Tom Lane wrote:

> "=?UTF-8?B?6JSh5qKm5aifKOeOiuS6jik=?=" <mengjuan.cmj@alibaba-inc.com> writes:
> > I want to share a patch with you, in which I add a guc parameter 'enable_send_stop' to enable set the value of
SendStopin postmaster.c more conveniently.  SendStop enable postmaster to send SIGSTOP rather than SIGQUIT to its
childrenwhen some backend dumps core, and this variable is originally set with -T parameter when start postgres, which
isinconvenient to control in some scenarios.

> 
> TBH, I'd sooner rip out SendStop, and simplify the related postmaster
> logic.

I wrote a patch to do that in 2012, after this exchange:
https://postgr.es/m/1333124720-sup-6193@alvh.no-ip.org
I obviously doesn't apply at all anymore, but the thing that prevented
me from sending it was I couldn't find what the mentioned feature was
that would cause all backends to dump core at the time of a crash.
So it seemed to me that we would be ripping out a feature I had used,
with no replacement.

(It applies cleanly on top of 36b7e3da17bc.)

-- 
Álvaro Herrera           39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"Cuando no hay humildad las personas se degradan" (A. Christie)

Attachment

0001-Remove-T-postmaster-option.patch

Re: Add guc to enable send SIGSTOP to peers when backend exits abnormally

From

Tom Lane

Date:

03 September 2021, 21:44:43

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> On 2021-Sep-03, Tom Lane wrote:
>> TBH, I'd sooner rip out SendStop, and simplify the related postmaster
>> logic.

> I wrote a patch to do that in 2012, after this exchange:
> https://postgr.es/m/1333124720-sup-6193@alvh.no-ip.org
> I obviously doesn't apply at all anymore, but the thing that prevented
> me from sending it was I couldn't find what the mentioned feature was
> that would cause all backends to dump core at the time of a crash.

Oh, I think you misunderstood what I wrote.  I was thinking of the
ancient habit of most kernels to dump cores to a file just named
"core"; so that even if you went around and manually SIGABRT'd
each stopped process, the cores would all overwrite each other,
leaving you with little to show for the exercise.  Nowadays you're
more likely to get "core.NNN" for each PID, so that it could in
principle be useful to force all the backends to dump core for later
analysis.  But I know of no mechanism that would do that for you.

However, thinking about this afresh, it seems like that Berkeley-era
comment about "the wily post_hacker" was never very apropos.  If what
you wanted was a few GB of core files for later analysis, it'd make
more sense to have the postmaster send SIGABRT or the like.  That
saves a bunch of tedious manual steps, plus the cluster isn't left
in a funny state that requires yet more manual cleanup steps.

So I'm thinking that the *real* use-case for this is for developers
to attach with gdb and do on-the-fly investigation of the state of
other backends, rather than forcing core-dumps.  However, it's still
a pretty half-baked feature because there's no easy way to clean up
afterwards.

The other elephant in the room is that by the time the postmaster
has reacted to the initial backend crash, it's dubious whether the
state of other processes is still able to tell you much.  (IME,
at least, the postmaster doesn't hear about it until the kernel
has finished writing out the dying process's core image, which
takes approximately forever compared to modern CPU speeds.)

> So it seemed to me that we would be ripping out a feature I had used,
> with no replacement.

If we had a really useful feature here I'd be all over it.
But it looks more like somebody's ten-minute hack, so the
fact that it's undocumented and obscure-to-invoke seems
appropriate to me.

(BTW, I think we had exactly this discussion way back when
Peter cleaned up the postmaster/postgres command line switches.
Just about all the other old switches have equivalent GUCs,
and IIRC it is not an oversight that SendStop was left out.)

            regards, tom lane