Thread: Ctrl+C from sh can shut down daemonized PostgreSQL cluster

Ctrl+C from sh can shut down daemonized PostgreSQL cluster

From
Kevin Grittner
Date:
We have had a case where a production cluster was accidentally shut
down by a customer who used Ctrl+C in the same sh session in which
they had (long before) run pg_ctl start.  We have only seen this in
sh on Solaris.  Other shells on Solaris don't behave this way, nor
does sh on tested versions of Linux.  Nevertheless, the problem is
seen on the default shell for a supported OS.

Analysis suggests that this is because the postmaster retains the
process group ID of the original parent (in this case pg_ctl).  If
pg_ctl is run through the setpgrp command a subsequent Ctrl+C in
the sh session does not shut down the PostgreSQL cluster.

On my development Linux machine:

$ ps axfopid,ppid,pgid,command
  PID  PPID  PGID COMMAND
[ ... ]
 8416     1  8412 /home/kgrittn/pg/master/Debug/bin/postgres -D Debug/data
 8418  8416  8418  \_ postgres: checkpointer process
 8419  8416  8419  \_ postgres: writer process
 8420  8416  8420  \_ postgres: wal writer process
 8421  8416  8421  \_ postgres: autovacuum launcher process
 8422  8416  8422  \_ postgres: stats collector process
 8427  8416  8427  \_ postgres: kgrittn test [local] idle

All of the PPID values seem correct, and while the PGID values for
backends might initially seem surprising, the commit notes and C
comments here explain why each backend sets up its own process
group:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=3ad0728

What is surprising is that the postmaster doesn't set up its own
process group when it is running as a daemon.  We probably don't
want to change that when postgres is run directly from a command
line for development or diagnostic purposes, but Noah suggested
perhaps we should add a --daemonize option which pg_ctl should use
when launching the postmaster, which would cause it to create its
own session group.

Although this is arguably a bug, it seems like it is very rarely
hit and has several workarounds, and any fix would either change
things in a way which might break existing user scripts or would
require a new command-line option; so back-patching a fix to stable
branches doesn't seem appropriate.  I would argue for including a
fix in 9.4 on the basis of it being a bug fix and there being time
to mention it in the release change notes; but I understand the
counter-arguments and realize this is a judgment call.

Thoughts?

If the new option seems reasonable, I can draft a patch to
implement that.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Ctrl+C from sh can shut down daemonized PostgreSQL cluster

From
Tom Lane
Date:
Kevin Grittner <kgrittn@ymail.com> writes:
> What is surprising is that the postmaster doesn't set up its own
> process group when it is running as a daemon.� We probably don't
> want to change that when postgres is run directly from a command
> line for development or diagnostic purposes, but Noah suggested
> perhaps we should add a --daemonize option which pg_ctl should use
> when launching the postmaster, which would cause it to create its
> own session group.

We intentionally removed the daemonization support that used to
be there; see commit f7ea6beaf4ca02b8e6dc576255e35a5b86035cb9.
One of the things it did was exactly this.  I'm a bit disinclined
to put that back.

If this is, as it sounds to be, a Solaris shell bug, doesn't it
affect other daemons too?
        regards, tom lane



Re: Ctrl+C from sh can shut down daemonized PostgreSQL cluster

From
Greg Stark
Date:
<p dir="ltr"><br /> On 14 Feb 2014 23:07, "Tom Lane" <<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>>
wrote:<br/> ><br /> > If this is, as it sounds to be, a Solaris shell bug, doesn't it<br /> > affect other
daemonstoo?<p dir="ltr">This is simmering i never exactly followed but i think if the shell doesn't support job control
it'sexpected behaviour, not a bug. Only shells that support job control create new process groups for every
backgroundedcommand.<p dir="ltr">I would have expected if I run postgres myself that it be attached to the terminal and
diewhen I C-c it but if it's started by pg_ctl I would have thought it was running independently of my terminal and
shell.

Re: Ctrl+C from sh can shut down daemonized PostgreSQL cluster

From
Hannu Krosing
Date:
<div class="moz-cite-prefix">On 02/15/2014 02:25 AM, Greg Stark wrote:<br /></div><blockquote
cite="mid:CAM-w4HOZ30mbZ2EZ3cTexN9bztZTtA6f5sEe_AAsr22Mba04xQ@mail.gmail.com"type="cite"><p dir="ltr"><br /> On 14 Feb
201423:07, "Tom Lane" <<a href="mailto:tgl@sss.pgh.pa.us" moz-do-not-send="true">tgl@sss.pgh.pa.us</a>> wrote:<br
/>><br /> > If this is, as it sounds to be, a Solaris shell bug, doesn't it<br /> > affect other daemons
too?<pdir="ltr">This is simmering i never exactly followed but i think if the shell doesn't support job control it's
expectedbehaviour, not a bug. Only shells that support job control create new process groups for every backgrounded
command.<pdir="ltr">I would have expected if I run postgres myself that it be attached to the terminal and die when I
C-cit but if it's started by pg_ctl I would have thought it was running independently of my terminal and
shell.</blockquote>In this case maybe it is pg_ctl which should do the deamoinizing ?<br /><br /><br /> Cheers<br
/><preclass="moz-signature" cols="72">-- 
 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ</pre>

Re: Ctrl+C from sh can shut down daemonized PostgreSQL cluster

From
Bjorn Munch
Date:
On 14/02 14.57, Kevin Grittner wrote:
> We have had a case where a production cluster was accidentally shut
> down by a customer who used Ctrl+C in the same sh session in which
> they had (long before) run pg_ctl start.  We have only seen this in
> sh on Solaris.  Other shells on Solaris don't behave this way, nor
> does sh on tested versions of Linux.  Nevertheless, the problem is
> seen on the default shell for a supported OS.

What Solaris version, and what version of sh?  sh on Solaris isn't
necessarily the "real" bourne shell. In Solaris 11 it's actually
ksh93.

I've seen a sort-of opposite problem which does not appear in stock
Solaris 10 or 11 but in OpenSolaris, at least the version I used to
have on my desktop.

And this was not PostgreSQL but MySQL.... There's a script mysqld_safe
which will automatically restart the mysqld server if it dies. But in
OpenSolaris with ksh version '93t', if I killed mysqld, the shell that
started it also died. I never could figure out why. Solaris 11 with
ksh '93u' does not have this problem. Nor does Solaris 10 with "real" sh.

Is this customer by any chance running OpenSolaris?

- Bjorn



Re: Ctrl+C from sh can shut down daemonized PostgreSQL cluster

From
Bruce Momjian
Date:
On Mon, Feb 17, 2014 at 10:38:29AM +0100, Bjorn Munch wrote:
> On 14/02 14.57, Kevin Grittner wrote:
> > We have had a case where a production cluster was accidentally shut
> > down by a customer who used Ctrl+C in the same sh session in which
> > they had (long before) run pg_ctl start.  We have only seen this in
> > sh on Solaris.  Other shells on Solaris don't behave this way, nor
> > does sh on tested versions of Linux.  Nevertheless, the problem is
> > seen on the default shell for a supported OS.
> 
> What Solaris version, and what version of sh?  sh on Solaris isn't
> necessarily the "real" bourne shell. In Solaris 11 it's actually
> ksh93.

This was Solaris 9.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: Ctrl+C from sh can shut down daemonized PostgreSQL cluster

From
Bruce Momjian
Date:
On Mon, Feb 17, 2014 at 10:38:29AM +0100, Bjorn Munch wrote:
> On 14/02 14.57, Kevin Grittner wrote:
> > We have had a case where a production cluster was accidentally shut
> > down by a customer who used Ctrl+C in the same sh session in which
> > they had (long before) run pg_ctl start.  We have only seen this in
> > sh on Solaris.  Other shells on Solaris don't behave this way, nor
> > does sh on tested versions of Linux.  Nevertheless, the problem is
> > seen on the default shell for a supported OS.
> 
> What Solaris version, and what version of sh?  sh on Solaris isn't
> necessarily the "real" bourne shell. In Solaris 11 it's actually
> ksh93.
> 
> I've seen a sort-of opposite problem which does not appear in stock
> Solaris 10 or 11 but in OpenSolaris, at least the version I used to
> have on my desktop.
> 
> And this was not PostgreSQL but MySQL.... There's a script mysqld_safe
> which will automatically restart the mysqld server if it dies. But in
> OpenSolaris with ksh version '93t', if I killed mysqld, the shell that
> started it also died. I never could figure out why. Solaris 11 with
> ksh '93u' does not have this problem. Nor does Solaris 10 with "real" sh.
> 
> Is this customer by any chance running OpenSolaris?

FYI, this email post has a header line that causes all replies to go
_only_ to the group email address:
Mail-Followup-To: pgsql-hackers@postgresql.org

I assume it is something related to the Oracle mail server or something
configured by the email author.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: Ctrl+C from sh can shut down daemonized PostgreSQL cluster

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> On Mon, Feb 17, 2014 at 10:38:29AM +0100, Bjorn Munch wrote:
>> What Solaris version, and what version of sh?  sh on Solaris isn't
>> necessarily the "real" bourne shell. In Solaris 11 it's actually
>> ksh93.

> This was Solaris 9.

Isn't that out of support by Oracle?
        regards, tom lane



Re: Ctrl+C from sh can shut down daemonized PostgreSQL cluster

From
Bruce Momjian
Date:
On Mon, Feb 17, 2014 at 12:25:33PM -0500, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > On Mon, Feb 17, 2014 at 10:38:29AM +0100, Bjorn Munch wrote:
> >> What Solaris version, and what version of sh?  sh on Solaris isn't
> >> necessarily the "real" bourne shell. In Solaris 11 it's actually
> >> ksh93.
> 
> > This was Solaris 9.
> 
> Isn't that out of support by Oracle?

It certainly might be --- I have no idea.  What surprised me is that we
are relying solely on system() to block signals to pg_ctl-spawned
servers.  The question is whether that is sufficient and whether we
should be doing more.  I don't think we have to make adjustments just
for Solaris 9.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: Ctrl+C from sh can shut down daemonized PostgreSQL cluster

From
Alvaro Herrera
Date:
Bruce Momjian wrote:

> FYI, this email post has a header line that causes all replies to go
> _only_ to the group email address:
> 
>     Mail-Followup-To: pgsql-hackers@postgresql.org
> 
> I assume it is something related to the Oracle mail server or something
> configured by the email author.

Most likely, Bjorn has followup_to set to true:http://www.mutt.org/doc/manual/manual-6.html#followup_to

I very much doubt that the mail server is injecting such a header.

Amusingly, Mutt also has an option to control whether to honor this
header:http://www.mutt.org/doc/manual/manual-6.html#honor_followup_to

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: Ctrl+C from sh can shut down daemonized PostgreSQL cluster

From
Bjorn Munch
Date:
On 17/02 12.25, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > On Mon, Feb 17, 2014 at 10:38:29AM +0100, Bjorn Munch wrote:
> >> What Solaris version, and what version of sh?  sh on Solaris isn't
> >> necessarily the "real" bourne shell. In Solaris 11 it's actually
> >> ksh93.
> 
> > This was Solaris 9.
> 
> Isn't that out of support by Oracle?

Not completely, final EOL is October 31 this year.

- Bjorn



Re: Ctrl+C from sh can shut down daemonized PostgreSQL cluster

From
Bjorn Munch
Date:
On 17/02 14.54, Alvaro Herrera wrote:
> Bruce Momjian wrote:
> 
> > FYI, this email post has a header line that causes all replies to go
> > _only_ to the group email address:
> > 
> >     Mail-Followup-To: pgsql-hackers@postgresql.org
> > 
> > I assume it is something related to the Oracle mail server or something
> > configured by the email author.
> 
> Most likely, Bjorn has followup_to set to true:
>     http://www.mutt.org/doc/manual/manual-6.html#followup_to
> 
> I very much doubt that the mail server is injecting such a header.

That would be it yes. :-) I hit 'L' to reply to the mailing list only
and that would by default also set this, I suppose. Nobody's
complained before. :-)

- Bjorn (also a mutt user)




Re: Ctrl+C from sh can shut down daemonized PostgreSQL cluster

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> It certainly might be --- I have no idea.  What surprised me is that we
> are relying solely on system() to block signals to pg_ctl-spawned
> servers.  The question is whether that is sufficient and whether we
> should be doing more.  I don't think we have to make adjustments just
> for Solaris 9.

We aren't relying on system(); it does no such thing, according to the
POSIX spec.  If it did, pg_ctl would be unable to print any errors to the
terminal, because dissociating from the foreground process group generally
also disables your ability to print on the terminal.

I poked around in the POSIX spec a bit, and if I'm reading it correctly,
the only thing that typically results in the postmaster becoming
dissociated from the terminal is use of "&" to launch it.  In a shell
with job control, that should result in the process being put into a
"background" process group that won't receive terminal signals nor be
permitted to do I/O to it.  This is distinct from dissociating altogether
because you can use "fg" to return the process to foreground; if we did a
setsid() we'd lose that option, if I'm reading the standards correctly.
So I'm loath to see the postmaster doing setsid() for itself.

POSIX also mandates that interactive shells have job control enabled by
default.

However ... the "&" isn't issued in the user's interactive shell.  It's
seen by the shell launched by pg_ctl's system() call.  So it appears to
be standards-conforming if that shell does nothing to move the launched
postmaster into the background.

The POSIX spec describes a shell switch -m that forces subprocesses
to be launched in their own process groups.  So maybe what we ought
to do is teach pg_ctl to do something like
  system("set -m; postgres ...");

Dunno if this is really portable, though it ought to be.

Alternatively, we could do what the comments in pg_ctl have long thought
desirable, namely get rid of use of system() in favor of fork()/exec().
With that, pg_ctl could do a setsid() inside the child process.

Or we could wait to see if anybody reports this sort of behavior in
a shell that won't be out of support before 9.4 gets out the door.
        regards, tom lane



Re: Ctrl+C from sh can shut down daemonized PostgreSQL cluster

From
Kevin Grittner
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Or we could wait to see if anybody reports this sort of behavior
> in a shell that won't be out of support before 9.4 gets out the
> door.

We have a field report of this happening in the sh shell in Solaris
10.  Our staff has confirmed this.  In Solaris 10 they can start
multiple clusters from a single shell, and if they later use Ctrl+C
in that shell all of those instances are shut down.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Ctrl+C from sh can shut down daemonized PostgreSQL cluster

From
Tom Lane
Date:
Kevin Grittner <kgrittn@ymail.com> writes:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Or we could wait to see if anybody reports this sort of behavior
>> in a shell that won't be out of support before 9.4 gets out the
>> door.

> We have a field report of this happening in the sh shell in Solaris
> 10.� Our staff has confirmed this.� In Solaris 10 they can start
> multiple clusters from a single shell, and if they later use Ctrl+C
> in that shell all of those instances are shut down.

Do you want to try the "set -m" hack suggested upthread?  I see no
point in pursuing the portability questions unless we've verified
that it fixes the problem for someone.
        regards, tom lane



Re: Ctrl+C from sh can shut down daemonized PostgreSQL cluster

From
Robert Haas
Date:
On Mon, Feb 17, 2014 at 8:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Bruce Momjian <bruce@momjian.us> writes:
>> It certainly might be --- I have no idea.  What surprised me is that we
>> are relying solely on system() to block signals to pg_ctl-spawned
>> servers.  The question is whether that is sufficient and whether we
>> should be doing more.  I don't think we have to make adjustments just
>> for Solaris 9.
>
> We aren't relying on system(); it does no such thing, according to the
> POSIX spec.  If it did, pg_ctl would be unable to print any errors to the
> terminal, because dissociating from the foreground process group generally
> also disables your ability to print on the terminal.
>
> I poked around in the POSIX spec a bit, and if I'm reading it correctly,
> the only thing that typically results in the postmaster becoming
> dissociated from the terminal is use of "&" to launch it.  In a shell
> with job control, that should result in the process being put into a
> "background" process group that won't receive terminal signals nor be
> permitted to do I/O to it.  This is distinct from dissociating altogether
> because you can use "fg" to return the process to foreground; if we did a
> setsid() we'd lose that option, if I'm reading the standards correctly.
> So I'm loath to see the postmaster doing setsid() for itself.
>
> POSIX also mandates that interactive shells have job control enabled by
> default.
>
> However ... the "&" isn't issued in the user's interactive shell.  It's
> seen by the shell launched by pg_ctl's system() call.  So it appears to
> be standards-conforming if that shell does nothing to move the launched
> postmaster into the background.
>
> The POSIX spec describes a shell switch -m that forces subprocesses
> to be launched in their own process groups.  So maybe what we ought
> to do is teach pg_ctl to do something like
>
>    system("set -m; postgres ...");
>
> Dunno if this is really portable, though it ought to be.
>
> Alternatively, we could do what the comments in pg_ctl have long thought
> desirable, namely get rid of use of system() in favor of fork()/exec().
> With that, pg_ctl could do a setsid() inside the child process.

I like this last approach.  It seems to me that the problem with
system() is that it's relying on the user's shell to have sane
behavior.  And while that obviously will work fine in a whole lot of
cases, I don't see why we should rely on it.  I don't think "your
shell must support -m with POSIX-like semantics" is really a
requirement we want to impose on PostgreSQL users.  The shell can't
make any system calls that we don't have access to ourselves, and
setsid() seems like the right one to use.  Maybe it's begging for
trouble to change anything here at all, but I think if we are going to
make a change it ought to be in the direction of making us less
dependent on the vagaries of the user's shell, not more.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Ctrl+C from sh can shut down daemonized PostgreSQL cluster

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Feb 17, 2014 at 8:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Alternatively, we could do what the comments in pg_ctl have long thought
>> desirable, namely get rid of use of system() in favor of fork()/exec().
>> With that, pg_ctl could do a setsid() inside the child process.

> I like this last approach.

Me too, but it looked like a less-than-trivial bit of work to me
(else I might just have gone and done it).  Are you volunteering?
        regards, tom lane



Re: Ctrl+C from sh can shut down daemonized PostgreSQL cluster

From
Robert Haas
Date:
On Wed, Apr 16, 2014 at 12:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Mon, Feb 17, 2014 at 8:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Alternatively, we could do what the comments in pg_ctl have long thought
>>> desirable, namely get rid of use of system() in favor of fork()/exec().
>>> With that, pg_ctl could do a setsid() inside the child process.
>
>> I like this last approach.
>
> Me too, but it looked like a less-than-trivial bit of work to me
> (else I might just have gone and done it).  Are you volunteering?

I don't have time right at the moment, but maybe at some point, if
nobody else gets to it first.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company