Thread: Add guc to enable send SIGSTOP to peers when backend exits abnormally
Hi all
I want to share a patch with you, in which I add a guc parameter 'enable_send_stop' to enable set the value of SendStop in postmaster.c more conveniently. SendStop enable postmaster to send SIGSTOP rather than SIGQUIT to its children when some backend dumps core, and this variable is originally set with -T parameter when start postgres, which is inconvenient to control in some scenarios.
Thanks & Best Regards
Attachment
"=?UTF-8?B?6JSh5qKm5aifKOeOiuS6jik=?=" <mengjuan.cmj@alibaba-inc.com> writes: > I want to share a patch with you, in which I add a guc parameter 'enable_send_stop' to enable set the value of SendStopin postmaster.c more conveniently. SendStop enable postmaster to send SIGSTOP rather than SIGQUIT to its childrenwhen some backend dumps core, and this variable is originally set with -T parameter when start postgres, which isinconvenient to control in some scenarios. TBH, I'd sooner rip out SendStop, and simplify the related postmaster logic. I've never used it in twenty-some years of Postgres hacking, and I doubt anyone else has used it much either. It's not worth the overhead of a GUC. (The argument that you need it in situations where you can't control the postmaster's command line seems pretty thin, too. I'm much more worried about somebody turning it on by accident and then complaining that the cluster freezes upon crash.) regards, tom lane
On 2021-Sep-03, Tom Lane wrote: > "=?UTF-8?B?6JSh5qKm5aifKOeOiuS6jik=?=" <mengjuan.cmj@alibaba-inc.com> writes: > > I want to share a patch with you, in which I add a guc parameter 'enable_send_stop' to enable set the value of SendStopin postmaster.c more conveniently. SendStop enable postmaster to send SIGSTOP rather than SIGQUIT to its childrenwhen some backend dumps core, and this variable is originally set with -T parameter when start postgres, which isinconvenient to control in some scenarios. > > TBH, I'd sooner rip out SendStop, and simplify the related postmaster > logic. I wrote a patch to do that in 2012, after this exchange: https://postgr.es/m/1333124720-sup-6193@alvh.no-ip.org I obviously doesn't apply at all anymore, but the thing that prevented me from sending it was I couldn't find what the mentioned feature was that would cause all backends to dump core at the time of a crash. So it seemed to me that we would be ripping out a feature I had used, with no replacement. (It applies cleanly on top of 36b7e3da17bc.) -- Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/ "Cuando no hay humildad las personas se degradan" (A. Christie)
Attachment
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > On 2021-Sep-03, Tom Lane wrote: >> TBH, I'd sooner rip out SendStop, and simplify the related postmaster >> logic. > I wrote a patch to do that in 2012, after this exchange: > https://postgr.es/m/1333124720-sup-6193@alvh.no-ip.org > I obviously doesn't apply at all anymore, but the thing that prevented > me from sending it was I couldn't find what the mentioned feature was > that would cause all backends to dump core at the time of a crash. Oh, I think you misunderstood what I wrote. I was thinking of the ancient habit of most kernels to dump cores to a file just named "core"; so that even if you went around and manually SIGABRT'd each stopped process, the cores would all overwrite each other, leaving you with little to show for the exercise. Nowadays you're more likely to get "core.NNN" for each PID, so that it could in principle be useful to force all the backends to dump core for later analysis. But I know of no mechanism that would do that for you. However, thinking about this afresh, it seems like that Berkeley-era comment about "the wily post_hacker" was never very apropos. If what you wanted was a few GB of core files for later analysis, it'd make more sense to have the postmaster send SIGABRT or the like. That saves a bunch of tedious manual steps, plus the cluster isn't left in a funny state that requires yet more manual cleanup steps. So I'm thinking that the *real* use-case for this is for developers to attach with gdb and do on-the-fly investigation of the state of other backends, rather than forcing core-dumps. However, it's still a pretty half-baked feature because there's no easy way to clean up afterwards. The other elephant in the room is that by the time the postmaster has reacted to the initial backend crash, it's dubious whether the state of other processes is still able to tell you much. (IME, at least, the postmaster doesn't hear about it until the kernel has finished writing out the dying process's core image, which takes approximately forever compared to modern CPU speeds.) > So it seemed to me that we would be ripping out a feature I had used, > with no replacement. If we had a really useful feature here I'd be all over it. But it looks more like somebody's ten-minute hack, so the fact that it's undocumented and obscure-to-invoke seems appropriate to me. (BTW, I think we had exactly this discussion way back when Peter cleaned up the postmaster/postgres command line switches. Just about all the other old switches have equivalent GUCs, and IIRC it is not an oversight that SendStop was left out.) regards, tom lane