Thread: SIGTERM does not stop backend postgres processes immediately

SIGTERM does not stop backend postgres processes immediately

From
Fred Yankowski
Date:
It seems that postgres backend processes built with Cygwin do not
react to the SIGTERM signal immediately.  Instead, they remain blocked
on a recv() call deep under ReadCommand() and don't notice the signal
until data comes in over the socket connection and unblocks recv().
This prevents a 'fast' stop of the whole PostgreSQL instance from
working correctly.

I'm seeing this problem in Cygwin 1.3.1 with cygipc-1.09-2, using
PostgreSQL built from source based on a very recent CVS snapshot.

This problem sounds similar to one reported in the pgsql-ports list
earlier this year [1].  That thread concludes that it's a Cygwin
problem, but with no solution yet.  Has there been any progress since
then?

[1] http://postgresql.readysetnet.com/mhonarc/pgsql-ports/2001-01/msg00023.html

--
Fred Yankowski           fred@OntoSys.com      tel: +1.630.879.1312
Principal Consultant     www.OntoSys.com       fax: +1.630.879.1370
OntoSys, Inc             38W242 Deerpath Rd, Batavia, IL 60510, USA

Re: SIGTERM does not stop backend postgres processes immediately

From
Jason Tishler
Date:
Fred,

On Tue, May 08, 2001 at 02:24:27PM -0500, Fred Yankowski wrote:
> This problem sounds similar to one reported in the pgsql-ports list
> earlier this year [1].  That thread concludes that it's a Cygwin
> problem, but with no solution yet.  Has there been any progress since
> then?
>
> [1] http://postgresql.readysetnet.com/mhonarc/pgsql-ports/2001-01/msg00023.html

Sorry for the dangling thread -- the discussion was moved over to the
cygwin-developers list:

    http://www.cygwin.com/ml/cygwin-developers/2001-02/msg00019.html

So, AFAICT the problem in [1] has been "solved."

However, I have not built PostgreSQL with Cygwin 1.3.1 -- I have only run
it against Cygwin 1.3.1.  What happens when you run make check?  Does the
postmaster exit cleanly at the end of the regression test as expected?
Or, does it hang?

Jason

--
Jason Tishler
Director, Software Engineering       Phone: +1 (732) 264-8770 x235
Dot Hill Systems Corp.               Fax:   +1 (732) 264-8798
82 Bethany Road, Suite 7             Email: Jason.Tishler@dothill.com
Hazlet, NJ 07730 USA                 WWW:   http://www.dothill.com

Re: SIGTERM does not stop backend postgres processes immediately

From
Fred Yankowski
Date:
I just ran 'make check' for postgres and all 76 tests passed.

The problem I'm seeing, where a postgres backend process doesn't react
immediately to SIGTERM, occurs even when there is only one such
backend process, so this may be a different problem from the one
described in those earlier threads and recently fixed in CVS.

I'm seeing this problem as I test my patch for running postgres as an
NT service.  But I just tried running postmaster directly from the
shell and I see the same problem.

Here's a scenario.

    BASH WINDOW 1
    |    BASH WINDOW 2
    |    |    BASH WINDOW 3
    v    v    v

    postmaster -i -D /usr/local/pgsql/data.test/ -d 1
    ### database comes up to "production state"

        psql -h localhost template1
        ### starts up OK and prompts for a command

            ps -ef
            ### 2 postgres processes (one is actually the
            ### postmaster) and 1 psql process

            pg_ctl -D /usr/local/pgsql/data.test/ -m fast stop
            ### reports "waiting" and many dots

    ### "Fast Shutdown request" message appears

            ### times out and reports "failed"

[nothing more happens (which is the problem to be solved) until I do ...]

        \d
        ### [Any command to the backend would do.]
        ### "connection terminated" message appears

    ### "database system is shut down" appears.

            ps -ef
            ### the postgres processes are gone.


I know from inserting printfs into the backend code that the SIGTERM
signal handler function is not being called right after the stop
request.  Rather, it is called only after the backend gets some data
over its input socket connection, from that "\d" in did in pg_ctl in
this case.  It seems that the recv() call deep in the backend code
does not get interrupted by the SIGTERM.

On Tue, May 08, 2001 at 10:05:19PM -0400, Jason Tishler wrote:
> However, I have not built PostgreSQL with Cygwin 1.3.1 -- I have only run
> it against Cygwin 1.3.1.  What happens when you run make check?  Does the
> postmaster exit cleanly at the end of the regression test as expected?

I'm a little confused about the distinction you're making between
"Cygwin 1.3.1" and "Cygwin 1.3.1".  ;-)  Anyway, "make check"
completes without any errors.  No apparent hangs.

--
Fred Yankowski           fred@OntoSys.com      tel: +1.630.879.1312
Principal Consultant     www.OntoSys.com       fax: +1.630.879.1370
OntoSys, Inc             38W242 Deerpath Rd, Batavia, IL 60510, USA

Re: SIGTERM does not stop backend postgres processes immediately

From
Jason Tishler
Date:
Fred,

On Wed, May 09, 2001 at 09:40:31AM -0500, Fred Yankowski wrote:
> The problem I'm seeing, where a postgres backend process doesn't react
> immediately to SIGTERM, occurs even when there is only one such
> backend process, so this may be a different problem from the one
> described in those earlier threads and recently fixed in CVS.

This is my assessment too.

> I'm seeing this problem as I test my patch for running postgres as an
> NT service.  But I just tried running postmaster directly from the
> shell and I see the same problem.

I was able to reproduce your finding under Cygwin too.  When I repeated
the experiment under Linux, postmaster shutdown as expected.

> I know from inserting printfs into the backend code that the SIGTERM
> signal handler function is not being called right after the stop
> request.  Rather, it is called only after the backend gets some data
> over its input socket connection, from that "\d" in did in pg_ctl in
> this case.  It seems that the recv() call deep in the backend code
> does not get interrupted by the SIGTERM.

IMO, you have found a Cygwin bug.  Please report it to the Cygwin list.
Hopefully, Mr. Signal is listening and will jump into action...
Can you produce a minimal test case that demonstrates the problem?

> On Tue, May 08, 2001 at 10:05:19PM -0400, Jason Tishler wrote:
> > However, I have not built PostgreSQL with Cygwin 1.3.1 -- I have only run
> > it against Cygwin 1.3.1.  What happens when you run make check?  Does the
> > postmaster exit cleanly at the end of the regression test as expected?
>
> I'm a little confused about the distinction you're making between
> "Cygwin 1.3.1" and "Cygwin 1.3.1".  ;-)

Sorry, for being unclear.  What I was trying to say was that my builds
of PostgreSQL are really against Cygwin 1.1.8 (with only cygwin1.dll
replaced to workaround the mmap/fork problem).  I have never built
against Cygwin 1.3.1.  However, I do run against Cygwin 1.3.1 on one of
my test machines.

> Anyway, "make check" completes without any errors.  No apparent hangs.

Which again confirms that this is a different and yet to be solved
problem.

Jason

--
Jason Tishler
Director, Software Engineering       Phone: +1 (732) 264-8770 x235
Dot Hill Systems Corp.               Fax:   +1 (732) 264-8798
82 Bethany Road, Suite 7             Email: Jason.Tishler@dothill.com
Hazlet, NJ 07730 USA                 WWW:   http://www.dothill.com

Re: SIGTERM does not stop backend postgres processes immediately

From
Christopher Faylor
Date:
On Wed, May 09, 2001 at 02:26:29PM -0400, Jason Tishler wrote:
>> I know from inserting printfs into the backend code that the SIGTERM
>> signal handler function is not being called right after the stop
>> request.  Rather, it is called only after the backend gets some data
>> over its input socket connection, from that "\d" in did in pg_ctl in
>> this case.  It seems that the recv() call deep in the backend code
>> does not get interrupted by the SIGTERM.
>
>IMO, you have found a Cygwin bug.  Please report it to the Cygwin list.
>Hopefully, Mr. Signal is listening and will jump into action...

Unfortunately, blocking recv() calls are not interruptible on Windows.
I'm not aware of any mechanism for allowing this.

cgf

Re: Re: SIGTERM does not stop backend postgres processes immediately

From
Hiroshi Inoue
Date:
Christopher Faylor wrote:
>
> On Wed, May 09, 2001 at 02:26:29PM -0400, Jason Tishler wrote:
> >> I know from inserting printfs into the backend code that the SIGTERM
> >> signal handler function is not being called right after the stop
> >> request.  Rather, it is called only after the backend gets some data
> >> over its input socket connection, from that "\d" in did in pg_ctl in
> >> this case.  It seems that the recv() call deep in the backend code
> >> does not get interrupted by the SIGTERM.
> >

How about inserting a select() call before the recv() ?
Cygwin's select() is interruptible AFAIK.

regards,
Hiroshi Inoue

Re: Re: SIGTERM does not stop backend postgres processes immediately

From
Hiroshi Inoue
Date:
Hiroshi Inoue wrote:
>
> Christopher Faylor wrote:
> >
> > On Wed, May 09, 2001 at 02:26:29PM -0400, Jason Tishler wrote:
> > >> I know from inserting printfs into the backend code that the SIGTERM
> > >> signal handler function is not being called right after the stop
> > >> request.  Rather, it is called only after the backend gets some data
> > >> over its input socket connection, from that "\d" in did in pg_ctl in
> > >> this case.  It seems that the recv() call deep in the backend code
> > >> does not get interrupted by the SIGTERM.
> > >
>
> How about inserting a select() call before the recv() ?
> Cygwin's select() is interruptible AFAIK.
>

I see the following reply from Chris in cygwin's archive(I'm not
the member).

  That would be the "workaround" that I kept mentioning previously.
  It relies on polling and that is a something I'd rather avoid, if
  possible.

My proposal is to pgsql-cygwin not to cygwin from the first.
The following is an example.

Comments ?

regards,
Hiroshi Inoue

                {
#ifdef __CYGWIN__
                        fd_set  rmask;
                        int     nsocks;

                        FD_ZERO(&rmask);
                        FD_SET(MyProcPort->sock, &rmask);
                        nsocks = MyProcPort->sock + 1;
                        if (select(nsocks, &rmask, (fd_set *) NULL,
(fd_set *) NULL, (struct timeval *) NULL) < 0)
                        {
                                if (errno == EINTR)
                                        continue;
                                fprintf(stderr, "pq_recvbuf: select()
failed: %s\n",
                                        strerror(errno));
                                return EOF;
                        }
#endif /* __CYGWIN__ */
                        r = recv(MyProcPort->sock, PqRecvBuffer +
PqRecvLength,
                                        PQ_BUFFER_SIZE - PqRecvLength,
0);

                }

Re: SIGTERM does not stop backend postgres processes immediately

From
Jason Tishler
Date:
Corrina,

On Tue, May 15, 2001 at 11:20:54AM +0200, Corinna Vinschen wrote:
> On Fri, May 11, 2001 at 09:09:28AM +1000, Robert Collins wrote:
> > Blueskying a concept here: what about cygwin opening all sockets in
> > non-blocking mode, and if the app thinks that it is a blocking call wait
> > on the socket && on a signal event?
> >
> > Obviously not trivial to get working right, but
> > a) would it work on 95?
> > b) thoughts?
>
> b) I have just applied a patch to Cygwin which uses overlapped IO
>    together with the Winsock2 calls WSARecv, WSARecvFrom, WSASend
>    and WSASendTo if available. The new mechanism is interruptable
>    by signals. If Winsock2 is not available the new implementation
>    just falls back to using the non-inerruptable Winsock1 calls.
>
>    I would like to ask people to test it especially in conjunction
>    with PostgreSQL, which I haven't set up.

I just tried my Cygwin PostgreSQL 7.1.1 distribution against the latest
Cygwin CVS and the above mentioned patch solves the postmaster shutdown
problem.  Now Cygwin PostgreSQL behaves identical to UNIX PostgreSQL
with regard to shutdown:

1. pg_ctl stop (i.e., kill -s SIGTERM) causes postmaster to wait for
all clients to disconnect before shutting down.

2. pg_ctl -m fast stop (i.e., kill -s SIGINT) causes postmaster to
shutdown immediately (but cleanly) without waiting for all clients
to disconnect.

Your patch fixed case 2 above and I believe this is the last piece needed
by Fred Yankowski to complete his PostgreSQL NT service patch.

Thank you very much for this patch -- it is really appreciated.

Jason

--
Jason Tishler
Director, Software Engineering       Phone: +1 (732) 264-8770 x235
Dot Hill Systems Corp.               Fax:   +1 (732) 264-8798
82 Bethany Road, Suite 7             Email: Jason.Tishler@dothill.com
Hazlet, NJ 07730 USA                 WWW:   http://www.dothill.com

Re: Re: SIGTERM does not stop backend postgres processes immediately

From
Jason Tishler
Date:
Hiroshi,

On Tue, May 15, 2001 at 10:30:39AM +0900, Hiroshi Inoue wrote:
> Hiroshi Inoue wrote:
> > Christopher Faylor wrote:
> > > On Wed, May 09, 2001 at 02:26:29PM -0400, Jason Tishler wrote:
> > > >> I know from inserting printfs into the backend code that the SIGTERM
> > > >> signal handler function is not being called right after the stop
> > > >> request.  Rather, it is called only after the backend gets some data
> > > >> over its input socket connection, from that "\d" in did in pg_ctl in
> > > >> this case.  It seems that the recv() call deep in the backend code
> > > >> does not get interrupted by the SIGTERM.
> > > >
> >
> > How about inserting a select() call before the recv() ?
> > Cygwin's select() is interruptible AFAIK.
>
> I see the following reply from Chris in cygwin's archive(I'm not
> the member).
>
>   That would be the "workaround" that I kept mentioning previously.
>   It relies on polling and that is a something I'd rather avoid, if
>   possible.
>
> My proposal is to pgsql-cygwin not to cygwin from the first.
> The following is an example.
>
> Comments ?
>
> [patch snipped]

Your patch is no longer needed since Cygwin's recv in now interruptible.
See the following for details:

    http://cygwin.com/ml/cygwin/2001-05/msg00752.html
    http://cygwin.com/ml/cygwin/2001-05/msg00774.html

Although, I do appreciate your efforts trying to come up with a
workaround.

Thanks,
Jason

--
Jason Tishler
Director, Software Engineering       Phone: +1 (732) 264-8770 x235
Dot Hill Systems Corp.               Fax:   +1 (732) 264-8798
82 Bethany Road, Suite 7             Email: Jason.Tishler@dothill.com
Hazlet, NJ 07730 USA                 WWW:   http://www.dothill.com

Re: SIGTERM does not stop backend postgres processes immediately

From
Fred Yankowski
Date:
On Tue, May 15, 2001 at 10:10:36AM -0400, Jason Tishler wrote:
> I just tried my Cygwin PostgreSQL 7.1.1 distribution against the latest
> Cygwin CVS and the above mentioned patch solves the postmaster shutdown
> problem.  Now Cygwin PostgreSQL behaves identical to UNIX PostgreSQL
> with regard to shutdown:

Wow, this is great!  It's a pleasure to see capable Cygwin developers
-- Corinna and Jason in particular, along with the others who posted
suggested ways to fix the problem -- dig in and solve problems.

> Your patch fixed case 2 above and I believe this is the last piece needed
> by Fred Yankowski to complete his PostgreSQL NT service patch.

I will resume work on that immediately.

The other problem I've been facing is how to handle the SIGHUP that
Cgywin generates in response to system shutdown.  Some quick tests
show that simply ignoring (SIG_IGN) the signal works, but that defeats
the use of SIGHUP to force the instance to re-read the configuration
file.  It may be, however, that fixing the recv() problem also fixes
the problem where getting a SIGHUP in the midst of stopping PostgreSQL
seemed to mess up the PostgreSQL state.  I'll check...

--
Fred Yankowski           fred@OntoSys.com      tel: +1.630.879.1312
Principal Consultant     www.OntoSys.com       fax: +1.630.879.1370
OntoSys, Inc             38W242 Deerpath Rd, Batavia, IL 60510, USA