Thread: Ctrl+C from sh can shut down daemonized PostgreSQL cluster
We have had a case where a production cluster was accidentally shut down by a customer who used Ctrl+C in the same sh session in which they had (long before) run pg_ctl start. We have only seen this in sh on Solaris. Other shells on Solaris don't behave this way, nor does sh on tested versions of Linux. Nevertheless, the problem is seen on the default shell for a supported OS. Analysis suggests that this is because the postmaster retains the process group ID of the original parent (in this case pg_ctl). If pg_ctl is run through the setpgrp command a subsequent Ctrl+C in the sh session does not shut down the PostgreSQL cluster. On my development Linux machine: $ ps axfopid,ppid,pgid,command PID PPID PGID COMMAND [ ... ] 8416 1 8412 /home/kgrittn/pg/master/Debug/bin/postgres -D Debug/data 8418 8416 8418 \_ postgres: checkpointer process 8419 8416 8419 \_ postgres: writer process 8420 8416 8420 \_ postgres: wal writer process 8421 8416 8421 \_ postgres: autovacuum launcher process 8422 8416 8422 \_ postgres: stats collector process 8427 8416 8427 \_ postgres: kgrittn test [local] idle All of the PPID values seem correct, and while the PGID values for backends might initially seem surprising, the commit notes and C comments here explain why each backend sets up its own process group: http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=3ad0728 What is surprising is that the postmaster doesn't set up its own process group when it is running as a daemon. We probably don't want to change that when postgres is run directly from a command line for development or diagnostic purposes, but Noah suggested perhaps we should add a --daemonize option which pg_ctl should use when launching the postmaster, which would cause it to create its own session group. Although this is arguably a bug, it seems like it is very rarely hit and has several workarounds, and any fix would either change things in a way which might break existing user scripts or would require a new command-line option; so back-patching a fix to stable branches doesn't seem appropriate. I would argue for including a fix in 9.4 on the basis of it being a bug fix and there being time to mention it in the release change notes; but I understand the counter-arguments and realize this is a judgment call. Thoughts? If the new option seems reasonable, I can draft a patch to implement that. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Kevin Grittner <kgrittn@ymail.com> writes: > What is surprising is that the postmaster doesn't set up its own > process group when it is running as a daemon.� We probably don't > want to change that when postgres is run directly from a command > line for development or diagnostic purposes, but Noah suggested > perhaps we should add a --daemonize option which pg_ctl should use > when launching the postmaster, which would cause it to create its > own session group. We intentionally removed the daemonization support that used to be there; see commit f7ea6beaf4ca02b8e6dc576255e35a5b86035cb9. One of the things it did was exactly this. I'm a bit disinclined to put that back. If this is, as it sounds to be, a Solaris shell bug, doesn't it affect other daemons too? regards, tom lane
<p dir="ltr"><br /> On 14 Feb 2014 23:07, "Tom Lane" <<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>> wrote:<br/> ><br /> > If this is, as it sounds to be, a Solaris shell bug, doesn't it<br /> > affect other daemonstoo?<p dir="ltr">This is simmering i never exactly followed but i think if the shell doesn't support job control it'sexpected behaviour, not a bug. Only shells that support job control create new process groups for every backgroundedcommand.<p dir="ltr">I would have expected if I run postgres myself that it be attached to the terminal and diewhen I C-c it but if it's started by pg_ctl I would have thought it was running independently of my terminal and shell.
<div class="moz-cite-prefix">On 02/15/2014 02:25 AM, Greg Stark wrote:<br /></div><blockquote cite="mid:CAM-w4HOZ30mbZ2EZ3cTexN9bztZTtA6f5sEe_AAsr22Mba04xQ@mail.gmail.com"type="cite"><p dir="ltr"><br /> On 14 Feb 201423:07, "Tom Lane" <<a href="mailto:tgl@sss.pgh.pa.us" moz-do-not-send="true">tgl@sss.pgh.pa.us</a>> wrote:<br />><br /> > If this is, as it sounds to be, a Solaris shell bug, doesn't it<br /> > affect other daemons too?<pdir="ltr">This is simmering i never exactly followed but i think if the shell doesn't support job control it's expectedbehaviour, not a bug. Only shells that support job control create new process groups for every backgrounded command.<pdir="ltr">I would have expected if I run postgres myself that it be attached to the terminal and die when I C-cit but if it's started by pg_ctl I would have thought it was running independently of my terminal and shell.</blockquote>In this case maybe it is pg_ctl which should do the deamoinizing ?<br /><br /><br /> Cheers<br /><preclass="moz-signature" cols="72">-- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ</pre>
On 14/02 14.57, Kevin Grittner wrote: > We have had a case where a production cluster was accidentally shut > down by a customer who used Ctrl+C in the same sh session in which > they had (long before) run pg_ctl start. We have only seen this in > sh on Solaris. Other shells on Solaris don't behave this way, nor > does sh on tested versions of Linux. Nevertheless, the problem is > seen on the default shell for a supported OS. What Solaris version, and what version of sh? sh on Solaris isn't necessarily the "real" bourne shell. In Solaris 11 it's actually ksh93. I've seen a sort-of opposite problem which does not appear in stock Solaris 10 or 11 but in OpenSolaris, at least the version I used to have on my desktop. And this was not PostgreSQL but MySQL.... There's a script mysqld_safe which will automatically restart the mysqld server if it dies. But in OpenSolaris with ksh version '93t', if I killed mysqld, the shell that started it also died. I never could figure out why. Solaris 11 with ksh '93u' does not have this problem. Nor does Solaris 10 with "real" sh. Is this customer by any chance running OpenSolaris? - Bjorn
On Mon, Feb 17, 2014 at 10:38:29AM +0100, Bjorn Munch wrote: > On 14/02 14.57, Kevin Grittner wrote: > > We have had a case where a production cluster was accidentally shut > > down by a customer who used Ctrl+C in the same sh session in which > > they had (long before) run pg_ctl start. We have only seen this in > > sh on Solaris. Other shells on Solaris don't behave this way, nor > > does sh on tested versions of Linux. Nevertheless, the problem is > > seen on the default shell for a supported OS. > > What Solaris version, and what version of sh? sh on Solaris isn't > necessarily the "real" bourne shell. In Solaris 11 it's actually > ksh93. This was Solaris 9. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On Mon, Feb 17, 2014 at 10:38:29AM +0100, Bjorn Munch wrote: > On 14/02 14.57, Kevin Grittner wrote: > > We have had a case where a production cluster was accidentally shut > > down by a customer who used Ctrl+C in the same sh session in which > > they had (long before) run pg_ctl start. We have only seen this in > > sh on Solaris. Other shells on Solaris don't behave this way, nor > > does sh on tested versions of Linux. Nevertheless, the problem is > > seen on the default shell for a supported OS. > > What Solaris version, and what version of sh? sh on Solaris isn't > necessarily the "real" bourne shell. In Solaris 11 it's actually > ksh93. > > I've seen a sort-of opposite problem which does not appear in stock > Solaris 10 or 11 but in OpenSolaris, at least the version I used to > have on my desktop. > > And this was not PostgreSQL but MySQL.... There's a script mysqld_safe > which will automatically restart the mysqld server if it dies. But in > OpenSolaris with ksh version '93t', if I killed mysqld, the shell that > started it also died. I never could figure out why. Solaris 11 with > ksh '93u' does not have this problem. Nor does Solaris 10 with "real" sh. > > Is this customer by any chance running OpenSolaris? FYI, this email post has a header line that causes all replies to go _only_ to the group email address: Mail-Followup-To: pgsql-hackers@postgresql.org I assume it is something related to the Oracle mail server or something configured by the email author. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
Bruce Momjian <bruce@momjian.us> writes: > On Mon, Feb 17, 2014 at 10:38:29AM +0100, Bjorn Munch wrote: >> What Solaris version, and what version of sh? sh on Solaris isn't >> necessarily the "real" bourne shell. In Solaris 11 it's actually >> ksh93. > This was Solaris 9. Isn't that out of support by Oracle? regards, tom lane
On Mon, Feb 17, 2014 at 12:25:33PM -0500, Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > On Mon, Feb 17, 2014 at 10:38:29AM +0100, Bjorn Munch wrote: > >> What Solaris version, and what version of sh? sh on Solaris isn't > >> necessarily the "real" bourne shell. In Solaris 11 it's actually > >> ksh93. > > > This was Solaris 9. > > Isn't that out of support by Oracle? It certainly might be --- I have no idea. What surprised me is that we are relying solely on system() to block signals to pg_ctl-spawned servers. The question is whether that is sufficient and whether we should be doing more. I don't think we have to make adjustments just for Solaris 9. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
Bruce Momjian wrote: > FYI, this email post has a header line that causes all replies to go > _only_ to the group email address: > > Mail-Followup-To: pgsql-hackers@postgresql.org > > I assume it is something related to the Oracle mail server or something > configured by the email author. Most likely, Bjorn has followup_to set to true:http://www.mutt.org/doc/manual/manual-6.html#followup_to I very much doubt that the mail server is injecting such a header. Amusingly, Mutt also has an option to control whether to honor this header:http://www.mutt.org/doc/manual/manual-6.html#honor_followup_to -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 17/02 12.25, Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > On Mon, Feb 17, 2014 at 10:38:29AM +0100, Bjorn Munch wrote: > >> What Solaris version, and what version of sh? sh on Solaris isn't > >> necessarily the "real" bourne shell. In Solaris 11 it's actually > >> ksh93. > > > This was Solaris 9. > > Isn't that out of support by Oracle? Not completely, final EOL is October 31 this year. - Bjorn
On 17/02 14.54, Alvaro Herrera wrote: > Bruce Momjian wrote: > > > FYI, this email post has a header line that causes all replies to go > > _only_ to the group email address: > > > > Mail-Followup-To: pgsql-hackers@postgresql.org > > > > I assume it is something related to the Oracle mail server or something > > configured by the email author. > > Most likely, Bjorn has followup_to set to true: > http://www.mutt.org/doc/manual/manual-6.html#followup_to > > I very much doubt that the mail server is injecting such a header. That would be it yes. :-) I hit 'L' to reply to the mailing list only and that would by default also set this, I suppose. Nobody's complained before. :-) - Bjorn (also a mutt user)
Bruce Momjian <bruce@momjian.us> writes: > It certainly might be --- I have no idea. What surprised me is that we > are relying solely on system() to block signals to pg_ctl-spawned > servers. The question is whether that is sufficient and whether we > should be doing more. I don't think we have to make adjustments just > for Solaris 9. We aren't relying on system(); it does no such thing, according to the POSIX spec. If it did, pg_ctl would be unable to print any errors to the terminal, because dissociating from the foreground process group generally also disables your ability to print on the terminal. I poked around in the POSIX spec a bit, and if I'm reading it correctly, the only thing that typically results in the postmaster becoming dissociated from the terminal is use of "&" to launch it. In a shell with job control, that should result in the process being put into a "background" process group that won't receive terminal signals nor be permitted to do I/O to it. This is distinct from dissociating altogether because you can use "fg" to return the process to foreground; if we did a setsid() we'd lose that option, if I'm reading the standards correctly. So I'm loath to see the postmaster doing setsid() for itself. POSIX also mandates that interactive shells have job control enabled by default. However ... the "&" isn't issued in the user's interactive shell. It's seen by the shell launched by pg_ctl's system() call. So it appears to be standards-conforming if that shell does nothing to move the launched postmaster into the background. The POSIX spec describes a shell switch -m that forces subprocesses to be launched in their own process groups. So maybe what we ought to do is teach pg_ctl to do something like system("set -m; postgres ..."); Dunno if this is really portable, though it ought to be. Alternatively, we could do what the comments in pg_ctl have long thought desirable, namely get rid of use of system() in favor of fork()/exec(). With that, pg_ctl could do a setsid() inside the child process. Or we could wait to see if anybody reports this sort of behavior in a shell that won't be out of support before 9.4 gets out the door. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > Or we could wait to see if anybody reports this sort of behavior > in a shell that won't be out of support before 9.4 gets out the > door. We have a field report of this happening in the sh shell in Solaris 10. Our staff has confirmed this. In Solaris 10 they can start multiple clusters from a single shell, and if they later use Ctrl+C in that shell all of those instances are shut down. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Kevin Grittner <kgrittn@ymail.com> writes: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Or we could wait to see if anybody reports this sort of behavior >> in a shell that won't be out of support before 9.4 gets out the >> door. > We have a field report of this happening in the sh shell in Solaris > 10.� Our staff has confirmed this.� In Solaris 10 they can start > multiple clusters from a single shell, and if they later use Ctrl+C > in that shell all of those instances are shut down. Do you want to try the "set -m" hack suggested upthread? I see no point in pursuing the portability questions unless we've verified that it fixes the problem for someone. regards, tom lane
On Mon, Feb 17, 2014 at 8:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Bruce Momjian <bruce@momjian.us> writes: >> It certainly might be --- I have no idea. What surprised me is that we >> are relying solely on system() to block signals to pg_ctl-spawned >> servers. The question is whether that is sufficient and whether we >> should be doing more. I don't think we have to make adjustments just >> for Solaris 9. > > We aren't relying on system(); it does no such thing, according to the > POSIX spec. If it did, pg_ctl would be unable to print any errors to the > terminal, because dissociating from the foreground process group generally > also disables your ability to print on the terminal. > > I poked around in the POSIX spec a bit, and if I'm reading it correctly, > the only thing that typically results in the postmaster becoming > dissociated from the terminal is use of "&" to launch it. In a shell > with job control, that should result in the process being put into a > "background" process group that won't receive terminal signals nor be > permitted to do I/O to it. This is distinct from dissociating altogether > because you can use "fg" to return the process to foreground; if we did a > setsid() we'd lose that option, if I'm reading the standards correctly. > So I'm loath to see the postmaster doing setsid() for itself. > > POSIX also mandates that interactive shells have job control enabled by > default. > > However ... the "&" isn't issued in the user's interactive shell. It's > seen by the shell launched by pg_ctl's system() call. So it appears to > be standards-conforming if that shell does nothing to move the launched > postmaster into the background. > > The POSIX spec describes a shell switch -m that forces subprocesses > to be launched in their own process groups. So maybe what we ought > to do is teach pg_ctl to do something like > > system("set -m; postgres ..."); > > Dunno if this is really portable, though it ought to be. > > Alternatively, we could do what the comments in pg_ctl have long thought > desirable, namely get rid of use of system() in favor of fork()/exec(). > With that, pg_ctl could do a setsid() inside the child process. I like this last approach. It seems to me that the problem with system() is that it's relying on the user's shell to have sane behavior. And while that obviously will work fine in a whole lot of cases, I don't see why we should rely on it. I don't think "your shell must support -m with POSIX-like semantics" is really a requirement we want to impose on PostgreSQL users. The shell can't make any system calls that we don't have access to ourselves, and setsid() seems like the right one to use. Maybe it's begging for trouble to change anything here at all, but I think if we are going to make a change it ought to be in the direction of making us less dependent on the vagaries of the user's shell, not more. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Feb 17, 2014 at 8:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Alternatively, we could do what the comments in pg_ctl have long thought >> desirable, namely get rid of use of system() in favor of fork()/exec(). >> With that, pg_ctl could do a setsid() inside the child process. > I like this last approach. Me too, but it looked like a less-than-trivial bit of work to me (else I might just have gone and done it). Are you volunteering? regards, tom lane
On Wed, Apr 16, 2014 at 12:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Mon, Feb 17, 2014 at 8:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Alternatively, we could do what the comments in pg_ctl have long thought >>> desirable, namely get rid of use of system() in favor of fork()/exec(). >>> With that, pg_ctl could do a setsid() inside the child process. > >> I like this last approach. > > Me too, but it looked like a less-than-trivial bit of work to me > (else I might just have gone and done it). Are you volunteering? I don't have time right at the moment, but maybe at some point, if nobody else gets to it first. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company