Thread: server won't shutdown

server won't shutdown

From
Laurette Cisneros
Date:
Server won't shutdown:

Currently, we are running 7.3.2.  However, we have had this problem since 7.1.

Our production server has dozens of connections from different persistent
applications (some web, some runnning on other systems).

When I do a shutdown:
$pgpath/bin/pg_ctl -D $datadir -m fast stop

the server will not completely shutdown.

I realize that the pg_ctl script will send a -QUIT to the postmaster
process, which will then also send some kind of quit signal to the outstanding
backends?, but, every time, some of the postgres backend processes will not
quit and the server "hangs" in shutdown mode.

To get around this, I send a -QUIT is sent to a backend process that is not
doing any activity and then the server comes down (and not in a good way I'm
sure).

Can anyone explain *why* the server won't shutdown as well as suggest the
*best* way to handle this situation?

Thanks,

--
Laurette Cisneros, L.D.
The Database Group
(510) 420-3137
NextBus Information Systems, Inc.
www.nextbus.com
----------------------------------
"No man is wise enough by himself"
-- Titus Maccius Plautus
   (254 Bc - 184 BC), Miles Gloriosus


Re: server won't shutdown

From
Tom Lane
Date:
Laurette Cisneros <laurette@nextbus.com> writes:
> When I do a shutdown:
> $pgpath/bin/pg_ctl -D $datadir -m fast stop
> the server will not completely shutdown.

Hm.  -m fast tells the postmaster to send SIGTERM to all the existing
backends, and then quit when all the backends have quit.  So your
problem is that one or more backend processes isn't exiting in a timely
fashion after receiving SIGTERM.  It would be useful to see what these
backends are doing.  Can you attach to each one with gdb and get a stack
trace?

            regards, tom lane

Re: server won't shutdown

From
Laurette Cisneros
Date:
Sure.

I will attempt this on our development server which only occasionally has
this problem.  If that won't reproduce it, I will attempt this on our
production server but that must be scheduled at a slow time - maybe Monday.

I'll send this info. once I've collected it.

Thanks for the help,

L.
On Wed, 12 Feb 2003, Tom Lane wrote:

> Laurette Cisneros <laurette@nextbus.com> writes:
> > When I do a shutdown:
> > $pgpath/bin/pg_ctl -D $datadir -m fast stop
> > the server will not completely shutdown.
>
> Hm.  -m fast tells the postmaster to send SIGTERM to all the existing
> backends, and then quit when all the backends have quit.  So your
> problem is that one or more backend processes isn't exiting in a timely
> fashion after receiving SIGTERM.  It would be useful to see what these
> backends are doing.  Can you attach to each one with gdb and get a stack
> trace?
>
>             regards, tom lane
>

--
Laurette Cisneros, L.D.
The Database Group
(510) 420-3137
NextBus Information Systems, Inc.
www.nextbus.com
----------------------------------
"No man is wise enough by himself"
-- Titus Maccius Plautus
   (254 Bc - 184 BC), Miles Gloriosus


Re: server won't shutdown

From
Laurette Cisneros
Date:
Ok, it reproduced today on my development server.

Here's the info:

/usr/local/pgsql/bin/pg_ctl -D /u1/pgsql73_data  -m fast   -l /u1/pgsql73_data/pgsql.log stop
waiting for postmaster to shut
down................................................................ failed
pg_ctl: postmaster does not shut down


ps -efw | grep post
postgres  5466     1  0 Feb12 ?        00:00:02 /usr/local/pgsql/bin/postmaster
postgres  5467  5466  0 Feb12 ?        00:00:45 postgres: stats buffer
process
postgres  5468  5467  0 Feb12 ?        00:15:49 postgres: stats collector
process

postgres  8077  5466  0 Feb12 ?        00:00:00 postgres: nbadmin db1 192.168.1.111 idle
postgres 23146  5466  0 09:36 ?        00:00:00 postgres: nbadmin db2 64.114.28.112 idle
postgres 28874  5466  0 12:15 ?        00:00:00 postgres: nbadmin db3 208.184.214.113 idle
postgres 29424  5466  0 12:30 ?        00:00:06 postgres: nbadmin db4 64.114.28.114 idle
postgres  3792  5466  0 15:32 ?        00:00:00 postgres: junior jr_db 192.168.1.114 idle


Here are the stack traces from each of the postmaster and postgres
processes that remain:

gdb /usr/local/pgsql/bin/postmaster 5466
GNU gdb 5.0rh-5 Red Hat Linux 7.1
Copyright 2001 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux"...
/u1/pgsql73_data/5466: No such file or directory.
Attaching to program: /usr/local/pgsql/bin/postmaster, process 5466
Reading symbols from /usr/lib/libssl.so.1...done.
Loaded symbols for /usr/lib/libssl.so.1
Reading symbols from /usr/lib/libcrypto.so.1...done.
Loaded symbols for /usr/lib/libcrypto.so.1
Reading symbols from /usr/lib/libz.so.1...done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /usr/lib/libreadline.so.4.1...done.
Loaded symbols for /usr/lib/libreadline.so.4.1
Reading symbols from /lib/libtermcap.so.2...done.
Loaded symbols for /lib/libtermcap.so.2
Reading symbols from /lib/libcrypt.so.1...done.
Loaded symbols for /lib/libcrypt.so.1
Reading symbols from /lib/libresolv.so.2...done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /lib/libnsl.so.1...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/i686/libm.so.6...done.
Loaded symbols for /lib/i686/libm.so.6
Reading symbols from /lib/i686/libc.so.6...done.
Loaded symbols for /lib/i686/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
0x402ab90e in __select () from /lib/i686/libc.so.6
(gdb) bt
#0  0x402ab90e in __select () from /lib/i686/libc.so.6
#1  0xbffff1b8 in ?? ()
#2  0x081023fb in PostmasterMain (argc=1, argv=0x822e000) at
postmaster.c:788
#3  0x080e0a07 in main (argc=1, argv=0xbffff8d4) at main.c:210
#4  0x401e7177 in __libc_start_main (main=0x80e0828 <main>, argc=1,
    ubp_av=0xbffff8d4, init=0x806aeec <_init>, fini=0x8172770 <_fini>,
    rtld_fini=0x4000e184 <_dl_fini>, stack_end=0xbffff8cc)
    at ../sysdeps/generic/libc-start.c:129

----------
gdb /usr/local/pgsql/bin/postgres 5467
GNU gdb 5.0rh-5 Red Hat Linux 7.1
Copyright 2001 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux"...
/u1/pgsql73_data/5467: No such file or directory.
Attaching to program: /usr/local/pgsql/bin/postgres, process 5467
Reading symbols from /usr/lib/libssl.so.1...done.
Loaded symbols for /usr/lib/libssl.so.1
Reading symbols from /usr/lib/libcrypto.so.1...done.
Loaded symbols for /usr/lib/libcrypto.so.1
Reading symbols from /usr/lib/libz.so.1...done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /usr/lib/libreadline.so.4.1...done.
Loaded symbols for /usr/lib/libreadline.so.4.1
Reading symbols from /lib/libtermcap.so.2...done.
Loaded symbols for /lib/libtermcap.so.2
Reading symbols from /lib/libcrypt.so.1...done.
Loaded symbols for /lib/libcrypt.so.1
Reading symbols from /lib/libresolv.so.2...done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /lib/libnsl.so.1...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/i686/libm.so.6...done.
Loaded symbols for /lib/i686/libm.so.6
Reading symbols from /lib/i686/libc.so.6...done.
Loaded symbols for /lib/i686/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
0x402ab90e in __select () from /lib/i686/libc.so.6
(gdb) bt
#0  0x402ab90e in __select () from /lib/i686/libc.so.6
#1  0x00000006 in __strtol_internal (nptr=0x3 <Address 0x3 out of bounds>,
    endptr=0x0, base=32, group=135287149) at eval.c:36
#2  0x08105189 in pgstat_main () at pgstat.c:1177
#3  0x081045f1 in pgstat_start () at pgstat.c:325
#4  0x081023c2 in PostmasterMain (argc=1, argv=0x822e000) at
postmaster.c:772
#5  0x080e0a07 in main (argc=1, argv=0xbffff8d4) at main.c:210
#6  0x401e7177 in __libc_start_main (main=0x80e0828 <main>, argc=1,
    ubp_av=0xbffff8d4, init=0x806aeec <_init>, fini=0x8172770 <_fini>,
    rtld_fini=0x4000e184 <_dl_fini>, stack_end=0xbffff8cc)
    at ../sysdeps/generic/libc-start.c:129

-------------
gdb /usr/local/pgsql/bin/postgres 5468
GNU gdb 5.0rh-5 Red Hat Linux 7.1
Copyright 2001 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux"...
/u1/pgsql73_data/5468: No such file or directory.
Attaching to program: /usr/local/pgsql/bin/postgres, process 5468
Reading symbols from /usr/lib/libssl.so.1...done.
Loaded symbols for /usr/lib/libssl.so.1
Reading symbols from /usr/lib/libcrypto.so.1...done.
Loaded symbols for /usr/lib/libcrypto.so.1
Reading symbols from /usr/lib/libz.so.1...done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /usr/lib/libreadline.so.4.1...done.
Loaded symbols for /usr/lib/libreadline.so.4.1
Reading symbols from /lib/libtermcap.so.2...done.
Loaded symbols for /lib/libtermcap.so.2
Reading symbols from /lib/libcrypt.so.1...done.
Loaded symbols for /lib/libcrypt.so.1
Reading symbols from /lib/libresolv.so.2...done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /lib/libnsl.so.1...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/i686/libm.so.6...done.
Loaded symbols for /lib/i686/libm.so.6
Reading symbols from /lib/i686/libc.so.6...done.
Loaded symbols for /lib/i686/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
0x402ab90e in __select () from /lib/i686/libc.so.6
(gdb) bt
#0  0x402ab90e in __select () from /lib/i686/libc.so.6
#1  0xbfffef60 in ?? ()
#2  0x081045f1 in pgstat_start () at pgstat.c:325
#3  0x081023c2 in PostmasterMain (argc=1, argv=0x822e000) at
postmaster.c:772
#4  0x080e0a07 in main (argc=1, argv=0xbffff8d4) at main.c:210
#5  0x401e7177 in __libc_start_main (main=0x80e0828 <main>, argc=1,
    ubp_av=0xbffff8d4, init=0x806aeec <_init>, fini=0x8172770 <_fini>,
    rtld_fini=0x4000e184 <_dl_fini>, stack_end=0xbffff8cc)
    at ../sysdeps/generic/libc-start.c:129

------------
 gdb /usr/local/pgsql/bin/postgres 8077
GNU gdb 5.0rh-5 Red Hat Linux 7.1
Copyright 2001 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux"...
/u1/pgsql73_data/8077: No such file or directory.
Attaching to program: /usr/local/pgsql/bin/postgres, process 8077
Reading symbols from /usr/lib/libssl.so.1...done.
Loaded symbols for /usr/lib/libssl.so.1
Reading symbols from /usr/lib/libcrypto.so.1...done.
Loaded symbols for /usr/lib/libcrypto.so.1
Reading symbols from /usr/lib/libz.so.1...done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /usr/lib/libreadline.so.4.1...done.
Loaded symbols for /usr/lib/libreadline.so.4.1
Reading symbols from /lib/libtermcap.so.2...done.
Loaded symbols for /lib/libtermcap.so.2
Reading symbols from /lib/libcrypt.so.1...done.
Loaded symbols for /lib/libcrypt.so.1
Reading symbols from /lib/libresolv.so.2...done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /lib/libnsl.so.1...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/i686/libm.so.6...done.
Loaded symbols for /lib/i686/libm.so.6
Reading symbols from /lib/i686/libc.so.6...done.
Loaded symbols for /lib/i686/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /usr/local/pgsql-7.3.2/lib/plpgsql.so...done.
Loaded symbols for /usr/local/pgsql-7.3.2/lib/plpgsql.so
0x402b2262 in __libc_recv () from /lib/i686/libc.so.6
(gdb) bt
#0  0x402b2262 in __libc_recv () from /lib/i686/libc.so.6
#1  0x080dc8bd in secure_read (port=0x8247110, ptr=0x81f7c40, len=8192)
    at be-secure.c:301
#2  0x080dff4d in pq_recvbuf () at pqcomm.c:463
#3  0x080dff9d in pq_getbyte () at pqcomm.c:500
#4  0x0811a675 in SocketBackend (inBuf=0x82b7b50) at postgres.c:247
#5  0x0811a6eb in ReadCommand (inBuf=0x82b7b50) at postgres.c:304
#6  0x0811bbb9 in PostgresMain (argc=4, argv=0xbfffef40,
    username=0x8247241 "nbadmin") at postgres.c:1930
#7  0x08103d60 in DoBackend (port=0x8247110) at postmaster.c:2302
#8  0x081036a6 in BackendStartup (port=0x8247110) at postmaster.c:1924
#9  0x081028a5 in ServerLoop () at postmaster.c:1027
#10 0x081023fb in PostmasterMain (argc=1, argv=0x822e000) at
postmaster.c:788
#11 0x080e0a07 in main (argc=1, argv=0xbffff8d4) at main.c:210
#12 0x401e7177 in __libc_start_main (main=0x80e0828 <main>, argc=1,
    ubp_av=0xbffff8d4, init=0x806aeec <_init>, fini=0x8172770 <_fini>,
    rtld_fini=0x4000e184 <_dl_fini>, stack_end=0xbffff8cc)
    at ../sysdeps/generic/libc-start.c:129

--------

gdb /usr/local/pgsql/bin/postgres 23146
GNU gdb 5.0rh-5 Red Hat Linux 7.1
Copyright 2001 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux"...
/u1/pgsql73_data/23146: No such file or directory.
Attaching to program: /usr/local/pgsql/bin/postgres, process 23146
Reading symbols from /usr/lib/libssl.so.1...done.
Loaded symbols for /usr/lib/libssl.so.1
Reading symbols from /usr/lib/libcrypto.so.1...done.
Loaded symbols for /usr/lib/libcrypto.so.1
Reading symbols from /usr/lib/libz.so.1...done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /usr/lib/libreadline.so.4.1...done.
Loaded symbols for /usr/lib/libreadline.so.4.1
Reading symbols from /lib/libtermcap.so.2...done.
Loaded symbols for /lib/libtermcap.so.2
Reading symbols from /lib/libcrypt.so.1...done.
Loaded symbols for /lib/libcrypt.so.1
Reading symbols from /lib/libresolv.so.2...done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /lib/libnsl.so.1...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/i686/libm.so.6...done.
Loaded symbols for /lib/i686/libm.so.6
Reading symbols from /lib/i686/libc.so.6...done.
Loaded symbols for /lib/i686/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /usr/local/pgsql-7.3.2/lib/plpgsql.so...done.
Loaded symbols for /usr/local/pgsql-7.3.2/lib/plpgsql.so
0x402b2262 in __libc_recv () from /lib/i686/libc.so.6
(gdb) bt
#0  0x402b2262 in __libc_recv () from /lib/i686/libc.so.6
#1  0x080dc8bd in secure_read (port=0x8247110, ptr=0x81f7c40, len=8192)
    at be-secure.c:301
#2  0x080dff4d in pq_recvbuf () at pqcomm.c:463
#3  0x080dff9d in pq_getbyte () at pqcomm.c:500
#4  0x0811a675 in SocketBackend (inBuf=0x82b7b78) at postgres.c:247
#5  0x0811a6eb in ReadCommand (inBuf=0x82b7b78) at postgres.c:304
#6  0x0811bbb9 in PostgresMain (argc=4, argv=0xbfffef40,
    username=0x8247241 "nbadmin") at postgres.c:1930
#7  0x08103d60 in DoBackend (port=0x8247110) at postmaster.c:2302
#8  0x081036a6 in BackendStartup (port=0x8247110) at postmaster.c:1924
#9  0x081028a5 in ServerLoop () at postmaster.c:1027
#10 0x081023fb in PostmasterMain (argc=1, argv=0x822e000) at
postmaster.c:788
#11 0x080e0a07 in main (argc=1, argv=0xbffff8d4) at main.c:210
#12 0x401e7177 in __libc_start_main (main=0x80e0828 <main>, argc=1,
    ubp_av=0xbffff8d4, init=0x806aeec <_init>, fini=0x8172770 <_fini>,
    rtld_fini=0x4000e184 <_dl_fini>, stack_end=0xbffff8cc)
    at ../sysdeps/generic/libc-start.c:129

----------
gdb /usr/local/pgsql/bin/postgres 28874
GNU gdb 5.0rh-5 Red Hat Linux 7.1
Copyright 2001 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux"...
/u1/pgsql73_data/28874: No such file or directory.
Attaching to program: /usr/local/pgsql/bin/postgres, process 28874
Reading symbols from /usr/lib/libssl.so.1...done.
Loaded symbols for /usr/lib/libssl.so.1
Reading symbols from /usr/lib/libcrypto.so.1...done.
Loaded symbols for /usr/lib/libcrypto.so.1
Reading symbols from /usr/lib/libz.so.1...done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /usr/lib/libreadline.so.4.1...done.
Loaded symbols for /usr/lib/libreadline.so.4.1
Reading symbols from /lib/libtermcap.so.2...done.
Loaded symbols for /lib/libtermcap.so.2
Reading symbols from /lib/libcrypt.so.1...done.
Loaded symbols for /lib/libcrypt.so.1
Reading symbols from /lib/libresolv.so.2...done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /lib/libnsl.so.1...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/i686/libm.so.6...done.
Loaded symbols for /lib/i686/libm.so.6
Reading symbols from /lib/i686/libc.so.6...done.
Loaded symbols for /lib/i686/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /usr/local/pgsql-7.3.2/lib/plpgsql.so...done.
Loaded symbols for /usr/local/pgsql-7.3.2/lib/plpgsql.so
Reading symbols from /usr/local/pgsql-7.3.2/lib/nextbus.so...done.
Loaded symbols for /usr/local/pgsql-7.3.2/lib/nextbus.so
0x402b2262 in __libc_recv () from /lib/i686/libc.so.6
(gdb) bt
#0  0x402b2262 in __libc_recv () from /lib/i686/libc.so.6
#1  0x080dc8bd in secure_read (port=0x8247110, ptr=0x81f7c40, len=8192)
    at be-secure.c:301
#2  0x080dff4d in pq_recvbuf () at pqcomm.c:463
#3  0x080dff9d in pq_getbyte () at pqcomm.c:500
#4  0x0811a675 in SocketBackend (inBuf=0x82b7c40) at postgres.c:247
#5  0x0811a6eb in ReadCommand (inBuf=0x82b7c40) at postgres.c:304
#6  0x0811bbb9 in PostgresMain (argc=4, argv=0xbfffef40,
    username=0x8247241 "nbadmin") at postgres.c:1930
#7  0x08103d60 in DoBackend (port=0x8247110) at postmaster.c:2302
#8  0x081036a6 in BackendStartup (port=0x8247110) at postmaster.c:1924
#9  0x081028a5 in ServerLoop () at postmaster.c:1027
#10 0x081023fb in PostmasterMain (argc=1, argv=0x822e000) at
postmaster.c:788
#11 0x080e0a07 in main (argc=1, argv=0xbffff8d4) at main.c:210
#12 0x401e7177 in __libc_start_main (main=0x80e0828 <main>, argc=1,
    ubp_av=0xbffff8d4, init=0x806aeec <_init>, fini=0x8172770 <_fini>,
    rtld_fini=0x4000e184 <_dl_fini>, stack_end=0xbffff8cc)
    at ../sysdeps/generic/libc-start.c:129

---------
gdb /usr/local/pgsql/bin/postgres 29424
GNU gdb 5.0rh-5 Red Hat Linux 7.1
Copyright 2001 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux"...
/u1/pgsql73_data/29424: No such file or directory.
Attaching to program: /usr/local/pgsql/bin/postgres, process 29424
Reading symbols from /usr/lib/libssl.so.1...done.
Loaded symbols for /usr/lib/libssl.so.1
Reading symbols from /usr/lib/libcrypto.so.1...done.
Loaded symbols for /usr/lib/libcrypto.so.1
Reading symbols from /usr/lib/libz.so.1...done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /usr/lib/libreadline.so.4.1...done.
Loaded symbols for /usr/lib/libreadline.so.4.1
Reading symbols from /lib/libtermcap.so.2...done.
Loaded symbols for /lib/libtermcap.so.2
Reading symbols from /lib/libcrypt.so.1...done.
Loaded symbols for /lib/libcrypt.so.1
Reading symbols from /lib/libresolv.so.2...done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /lib/libnsl.so.1...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/i686/libm.so.6...done.
Loaded symbols for /lib/i686/libm.so.6
Reading symbols from /lib/i686/libc.so.6...done.
Loaded symbols for /lib/i686/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /usr/local/pgsql-7.3.2/lib/plpgsql.so...done.
Loaded symbols for /usr/local/pgsql-7.3.2/lib/plpgsql.so
Reading symbols from /usr/local/pgsql-7.3.2/lib/nextbus.so...done.
Loaded symbols for /usr/local/pgsql-7.3.2/lib/nextbus.so
0x402b2262 in __libc_recv () from /lib/i686/libc.so.6
(gdb) bt
#0  0x402b2262 in __libc_recv () from /lib/i686/libc.so.6
#1  0x080dc8bd in secure_read (port=0x8247110, ptr=0x81f7c40, len=8192)
    at be-secure.c:301
#2  0x080dff4d in pq_recvbuf () at pqcomm.c:463
#3  0x080dff9d in pq_getbyte () at pqcomm.c:500
#4  0x0811a675 in SocketBackend (inBuf=0x82b7cb8) at postgres.c:247
#5  0x0811a6eb in ReadCommand (inBuf=0x82b7cb8) at postgres.c:304
#6  0x0811bbb9 in PostgresMain (argc=4, argv=0xbfffef40,
    username=0x8247241 "nbadmin") at postgres.c:1930
#7  0x08103d60 in DoBackend (port=0x8247110) at postmaster.c:2302
#8  0x081036a6 in BackendStartup (port=0x8247110) at postmaster.c:1924
#9  0x081028a5 in ServerLoop () at postmaster.c:1027
#10 0x081023fb in PostmasterMain (argc=1, argv=0x822e000) at
postmaster.c:788
#11 0x080e0a07 in main (argc=1, argv=0xbffff8d4) at main.c:210
#12 0x401e7177 in __libc_start_main (main=0x80e0828 <main>, argc=1,
    ubp_av=0xbffff8d4, init=0x806aeec <_init>, fini=0x8172770 <_fini>,
    rtld_fini=0x4000e184 <_dl_fini>, stack_end=0xbffff8cc)
    at ../sysdeps/generic/libc-start.c:129

---------
gdb /usr/local/pgsql/bin/postgres 3792
GNU gdb 5.0rh-5 Red Hat Linux 7.1
Copyright 2001 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux"...
/u1/pgsql73_data/3792: No such file or directory.
Attaching to program: /usr/local/pgsql/bin/postgres, process 3792
Reading symbols from /usr/lib/libssl.so.1...done.
Loaded symbols for /usr/lib/libssl.so.1
Reading symbols from /usr/lib/libcrypto.so.1...done.
Loaded symbols for /usr/lib/libcrypto.so.1
Reading symbols from /usr/lib/libz.so.1...done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /usr/lib/libreadline.so.4.1...done.
Loaded symbols for /usr/lib/libreadline.so.4.1
Reading symbols from /lib/libtermcap.so.2...done.
Loaded symbols for /lib/libtermcap.so.2
Reading symbols from /lib/libcrypt.so.1...done.
Loaded symbols for /lib/libcrypt.so.1
Reading symbols from /lib/libresolv.so.2...done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /lib/libnsl.so.1...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/i686/libm.so.6...done.
Loaded symbols for /lib/i686/libm.so.6
Reading symbols from /lib/i686/libc.so.6...done.
Loaded symbols for /lib/i686/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
0x402b2262 in __libc_recv () from /lib/i686/libc.so.6
(gdb) bt
#0  0x402b2262 in __libc_recv () from /lib/i686/libc.so.6
#1  0x080dc8bd in secure_read (port=0x8247110, ptr=0x81f7c40, len=8192)
    at be-secure.c:301
#2  0x080dff4d in pq_recvbuf () at pqcomm.c:463
#3  0x080dff9d in pq_getbyte () at pqcomm.c:500
#4  0x0811a675 in SocketBackend (inBuf=0x82b7958) at postgres.c:247
#5  0x0811a6eb in ReadCommand (inBuf=0x82b7958) at postgres.c:304
#6  0x0811bbb9 in PostgresMain (argc=4, argv=0xbfffef40,
    username=0x8247241 "jshields") at postgres.c:1930
#7  0x08103d60 in DoBackend (port=0x8247110) at postmaster.c:2302
#8  0x081036a6 in BackendStartup (port=0x8247110) at postmaster.c:1924
#9  0x081028a5 in ServerLoop () at postmaster.c:1027
#10 0x081023fb in PostmasterMain (argc=1, argv=0x822e000) at
postmaster.c:788
#11 0x080e0a07 in main (argc=1, argv=0xbffff8d4) at main.c:210
#12 0x401e7177 in __libc_start_main (main=0x80e0828 <main>, argc=1,
    ubp_av=0xbffff8d4, init=0x806aeec <_init>, fini=0x8172770 <_fini>,
    rtld_fini=0x4000e184 <_dl_fini>, stack_end=0xbffff8cc)
    at ../sysdeps/generic/libc-start.c:129

Thanks!

L.
On Wed, 12 Feb 2003, Tom Lane wrote:

> Laurette Cisneros <laurette@nextbus.com> writes:
> > When I do a shutdown:
> > $pgpath/bin/pg_ctl -D $datadir -m fast stop
> > the server will not completely shutdown.
>
> Hm.  -m fast tells the postmaster to send SIGTERM to all the existing
> backends, and then quit when all the backends have quit.  So your
> problem is that one or more backend processes isn't exiting in a timely
> fashion after receiving SIGTERM.  It would be useful to see what these
> backends are doing.  Can you attach to each one with gdb and get a stack
> trace?
>
>             regards, tom lane
>

--
Laurette Cisneros, L.D.
The Database Group
(510) 420-3137
NextBus Information Systems, Inc.
www.nextbus.com
----------------------------------
"No man is wise enough by himself"
-- Titus Maccius Plautus
   (254 Bc - 184 BC), Miles Gloriosus


Re: server won't shutdown

From
Tom Lane
Date:
Laurette Cisneros <laurette@nextbus.com> writes:
> Ok, it reproduced today on my development server.

Boy, it sure looks like everything's just sitting idle waiting for
client input, and hasn't noticed your shutdown request at all.  You
sure you used "-m fast"?  Could you attach to the postmaster and see
what its Shutdown variable contains, and to the backends and see what
the variables manipulated by die() contain?  (InterruptPending,
proc_exit_inprogress, ProcDiePending, ImmediateInterruptOK,
InterruptHoldoffCount, CritSectionCount)

            regards, tom lane

Re: server won't shutdown

From
Laurette Cisneros
Date:
Sure, next time it happens.  I'm back up again.

On Thu, 13 Feb 2003, Tom Lane wrote:

> Laurette Cisneros <laurette@nextbus.com> writes:
> > Ok, it reproduced today on my development server.
>
> Boy, it sure looks like everything's just sitting idle waiting for
> client input, and hasn't noticed your shutdown request at all.  You
> sure you used "-m fast"?  Could you attach to the postmaster and see
> what its Shutdown variable contains, and to the backends and see what
> the variables manipulated by die() contain?  (InterruptPending,
> proc_exit_inprogress, ProcDiePending, ImmediateInterruptOK,
> InterruptHoldoffCount, CritSectionCount)
>
>             regards, tom lane
>

--
Laurette Cisneros, L.D.
The Database Group
(510) 420-3137
NextBus Information Systems, Inc.
www.nextbus.com
----------------------------------
"No man is wise enough by himself"
-- Titus Maccius Plautus
   (254 Bc - 184 BC), Miles Gloriosus


Re: server won't shutdown

From
Laurette Cisneros
Date:
Ok, my production server had this problem (again).  So, here's the info.
you requested.  There's a lot of postgres processes still left running:

postmaster (proc id 1457):
(gdb) print Shutdown
$1 = 2

Other backends (postgres) still hanging around:
postgres  1458  1457  0 Feb06 ?        00:22:41 postgres: stats buffer process
InterruptPending: $1 = 0
proc_exit_inprogress: $2 = 0
ProcDiePending: $3 = 0
ImmediateInterruptOK = $4 = 0
InterruptHoldoffCount = $5 = 0
CritSectionCount = $6 = 0

postgres  1459  1458  2 Feb06 ?        05:31:38 postgres: stats collector process
InterruptPending: $1 = 0
proc_exit_inprogress: $2 = 0
ProcDiePending: $3 = 0
ImmediateInterruptOK = $4 = 0
InterruptHoldoffCount = $5 = 0
CritSectionCount = $6 = 0

postgres  1468  1457  0 Feb06 ?        00:04:57 postgres: nbadmin db1 111.123.123.244 idle
InterruptPending: $1 = 65537
proc_exit_inprogress: $2 = 0
ProcDiePending: $3 = 1
ImmediateInterruptOK = $4 = 0
InterruptHoldoffCount = $5 = 0
CritSectionCount = $6 = 0

postgres  1461  1457  0 Feb06 ?        00:02:06 postgres: nbadmin db2 111.123.123.253 idle
InterruptPending: $1 = 65537
proc_exit_inprogress: $2 = 0
ProcDiePending: $3 = 1
ImmediateInterruptOK: $4 = 0
InterruptHoldoffCount: $5 = 0
CritSectionCount: $6 = 0

postgres  1478  1457  0 Feb06 ?        00:04:13 postgres: nbadmin db3 111.123.124.244 idle
InterruptPending: $1 = 65537
proc_exit_inprogress: $2 = 0
ProcDiePending: $3 = 1
ImmediateInterruptOK: $4 = 0
InterruptHoldoffCount: $5 = 0
CritSectionCount: $6 = 0

postgres  9818  1457  0 Feb06 ?        00:00:01 postgres: nbadmin db4 111.123.123.244 idle
InterruptPending: $1 = 65537
proc_exit_inprogress: $2 = 0
ProcDiePending: $3 = 1
ImmediateInterruptOK: $4 = 0
InterruptHoldoffCount: $5 = 0
CritSectionCount: $6 = 0

postgres 15677  1457  0 Feb06 ?        00:00:04 postgres: nbadmin db5 111.123.123.253 idle
InterruptPending: $1 = 65537
proc_exit_inprogress: $2 = 0
ProcDiePending: $3 = 1
ImmediateInterruptOK: $4 = 0
InterruptHoldoffCount: $5 = 0
CritSectionCount: $6 = 0

postgres 30207  1457  0 Feb09 ?        00:00:00 postgres: nbadmin db6 111.123.123.244 idle
InterruptPending: $1 = 65537
proc_exit_inprogress: $2 = 0
ProcDiePending: $3 = 1
ImmediateInterruptOK: $4 = 0
InterruptHoldoffCount: $5 = 0
CritSectionCount: $6 = 0

postgres  7316  1457  0 Feb11 ?        00:00:01 postgres: nbadmin db7 111.123.123.244 idle
InterruptPending: $1 = 65537
proc_exit_inprogress: $2 = 0
ProcDiePending: $3 = 1
ImmediateInterruptOK: $4 = 0
InterruptHoldoffCount: $5 = 0
CritSectionCount: $6 = 0

postgres 10440  1457  0 Feb12 ?        00:00:00 postgres: nbadmin db8 111.123.123.253 idle
InterruptPending: $1 = 65537
proc_exit_inprogress: $2 = 0
ProcDiePending: $3 = 1
ImmediateInterruptOK: $4 = 0
InterruptHoldoffCount: $5 = 0
CritSectionCount: $6 = 0

postgres 10441  1457  0 Feb12 ?        00:02:15 postgres: nbadmin db9 111.123.123.253 idle
InterruptPending: $1 = 65537
proc_exit_inprogress: $2 = 0
ProcDiePending: $3 = 1
ImmediateInterruptOK: $4 = 0
InterruptHoldoffCount: $5 = 0
CritSectionCount: $6 = 0

postgres 27355  1457  0 Feb13 ?        00:00:00 postgres: nbadmin db10 111.123.123.253 idle
InterruptPending: $1 = 65537
proc_exit_inprogress: $2 = 0
ProcDiePending: $3 = 1
ImmediateInterruptOK: $4 = 0
InterruptHoldoffCount: $5 = 0
CritSectionCount: $6 = 0

postgres 28491  1457  0 Feb13 ?        00:00:03 postgres: nbadmin db11 111.123.123.253 idle
InterruptPending: $1 = 65537
proc_exit_inprogress: $2 = 0
ProcDiePending: $3 = 1
ImmediateInterruptOK: $4 = 0
InterruptHoldoffCount: $5 = 0
CritSectionCount: $6 = 0

postgres 30021  1457  0 Feb14 ?        00:00:01 postgres: nbadmin db12 111.123.123.244 idle
InterruptPending: $1 = 65537
proc_exit_inprogress: $2 = 0
ProcDiePending: $3 = 1
ImmediateInterruptOK: $4 = 0
InterruptHoldoffCount: $5 = 0
CritSectionCount: $6 = 0

postgres 10663  1457  0 Feb14 ?        00:00:00 postgres: nbadmin db13 111.123.123.244 idle
InterruptPending: $1 = 65537
proc_exit_inprogress: $2 = 0
ProcDiePending: $3 = 1
ImmediateInterruptOK: $4 = 0
InterruptHoldoffCount: $5 = 0
CritSectionCount: $6 = 0

postgres 10665  1457  0 Feb14 ?        00:00:00 postgres: nbadmin db13 111.123.123.244 idle
InterruptPending: $1 = 65537
proc_exit_inprogress: $2 = 0
ProcDiePending: $3 = 1
ImmediateInterruptOK: $4 = 0
InterruptHoldoffCount: $5 = 0
CritSectionCount: $6 = 0

Thank again,

Laurette

On Thu, 13 Feb 2003, Tom Lane wrote:

> Laurette Cisneros <laurette@nextbus.com> writes:
> > Ok, it reproduced today on my development server.
>
> Boy, it sure looks like everything's just sitting idle waiting for
> client input, and hasn't noticed your shutdown request at all.  You
> sure you used "-m fast"?  Could you attach to the postmaster and see
> what its Shutdown variable contains, and to the backends and see what
> the variables manipulated by die() contain?  (InterruptPending,
> proc_exit_inprogress, ProcDiePending, ImmediateInterruptOK,
> InterruptHoldoffCount, CritSectionCount)
>
>             regards, tom lane
>

--
Laurette Cisneros, L.D.
The Database Group
(510) 420-3137
NextBus Information Systems, Inc.
www.nextbus.com
----------------------------------
"No man is wise enough by himself"
-- Titus Maccius Plautus
   (254 Bc - 184 BC), Miles Gloriosus


Re: server won't shutdown

From
Tom Lane
Date:
Laurette Cisneros <laurette@nextbus.com> writes:
> Ok, my production server had this problem (again).  So, here's the info.
> you requested.  There's a lot of postgres processes still left running:

> postmaster (proc id 1457):
> (gdb) print Shutdown
> $1 = 2

Okay, so the postmaster definitely got the FastShutdown request ...

> postgres  1468  1457  0 Feb06 ?        00:04:57 postgres: nbadmin db1 111.123.123.244 idle
> InterruptPending: $1 = 65537
> proc_exit_inprogress: $2 = 0
> ProcDiePending: $3 = 1
> ImmediateInterruptOK = $4 = 0
> InterruptHoldoffCount = $5 = 0
> CritSectionCount = $6 = 0

... and it correctly sent it on to the backends, because they all have
ProcDiePending true, which proves that the die() signal handler did run.
So why are they sitting waiting for input?  They should have exited,
either instantly in die() if it occurred while they were already waiting
for input, or at the CHECK_FOR_INTERRUPTS() just before they next called
ReadCommand().  I can't see any window for failure there.

I'm baffled.  The only thing I can see to investigate is that
InterruptPending printed out as 65537 in each backend.  It's declared as
bool, and only ever set to true or false, so I'd have expected a result
of 0 or 1 --- in fact, since bool is char, it's impossible for it to
contain a value of 65537.  Is gdb confused, or is there something fishy
there?

BTW, do you have any idea whether any backends *did* exit?  Is this all
of the ones that were running, or did some obey the shutdown signal?
(You could look for "This connection has been terminated by the
administrator." errors in the postmaster log to see if any did obey.)

            regards, tom lane

Re: server won't shutdown

From
Tom Lane
Date:
"Ed L." <pgadmin@bluepolka.net> writes:
> I've been seeing the same problem intermittently over the past 2 years
> among ~30 production clusters.

Hm, I suddenly have a theory.  Do you guys use LISTEN/NOTIFY in the
databases that are exhibiting the problem?  Is it possible that the
backends that aren't cooperating have not done anything since they
last received a NOTIFY?  (That would imply that the connected frontend
didn't issue any SQL commands in response to the notification.)

            regards, tom lane

Re: server won't shutdown

From
Tom Lane
Date:
"Ed L." <pggeneral@bluepolka.net> writes:
> On Monday February 17 2003 5:09, Tom Lane wrote:
>> Hm, I suddenly have a theory.  Do you guys use LISTEN/NOTIFY in the
>> databases that are exhibiting the problem?

> Not using LISTEN/NOTIFY here, at least not of which I'm aware.  I'll
> dig a little more and let you know if I find differently.

Oh, duh!  You don't need to be using NOTIFY explicitly --- you just have
to sit idle long enough to make the SI buffer overflow, and the system
will try to NOTIFY your backend anyway to make it read the SI message
buffer.  Which is the path I was looking at --- that will leave
ImmediateInterruptOK false, preventing response to SIGTERM if the client
still hasn't done anything by the time it comes.

Patch forthcoming.

            regards, tom lane

Re: server won't shutdown

From
Tom Lane
Date:
I said:
> Oh, duh!  You don't need to be using NOTIFY explicitly --- you just have
> to sit idle long enough to make the SI buffer overflow, and the system
> will try to NOTIFY your backend anyway to make it read the SI message
> buffer.  Which is the path I was looking at --- that will leave
> ImmediateInterruptOK false, preventing response to SIGTERM if the client
> still hasn't done anything by the time it comes.

> Patch forthcoming.

Attached is the fix against 7.3 (or CVS tip).  The same bug exists in
7.2 and 7.1, and should be fixable by the same patch (possibly with some
fuzz).

You can demonstrate the problem by doing 'LISTEN foo' in one psql and
then 'NOTIFY foo' in another, then attempting a fast shutdown --- the
first psql's backend will ignore you until the psql session does something.
The same path can be followed without LISTEN if the first psql is simply
left idle while the second one does a bunch of catalog-updating work
(a few cycles of VACUUM ANALYZE usually suffice).

Sigh, another day another bug...

            regards, tom lane

*** src/backend/commands/async.c.orig    Sun Sep 15 21:24:41 2002
--- src/backend/commands/async.c    Mon Feb 17 21:38:47 2003
***************
*** 599,604 ****
--- 599,614 ----

      if (notifyInterruptEnabled)
      {
+         bool        save_ImmediateInterruptOK = ImmediateInterruptOK;
+
+         /*
+          * We may be called while ImmediateInterruptOK is true; turn it off
+          * while messing with the NOTIFY state.  (We would have to save
+          * and restore it anyway, because PGSemaphore operations inside
+          * ProcessIncomingNotify() might reset it.)
+          */
+         ImmediateInterruptOK = false;
+
          /*
           * I'm not sure whether some flavors of Unix might allow another
           * SIGUSR2 occurrence to recursively interrupt this routine. To
***************
*** 626,631 ****
--- 636,648 ----
                      elog(LOG, "Async_NotifyHandler: done");
              }
          }
+
+         /*
+          * Restore ImmediateInterruptOK, and check for interrupts if needed.
+          */
+         ImmediateInterruptOK = save_ImmediateInterruptOK;
+         if (save_ImmediateInterruptOK)
+             CHECK_FOR_INTERRUPTS();
      }
      else
      {

Re: server won't shutdown

From
Ed Loehr
Date:
I've been seeing the same problem intermittently over the past 2 years
among ~30 production clusters.  Today I saw it happen with 3-4 of the
7.2.3 clusters on the same host (PostgreSQL 7.2.3 on
i686-pc-linux-gnu, compiled by GCC 2.96 with 2.4.18-4bigmem kernel).
Each cluster had a number of long-idle connections, no current
activity.  Will try to gather a little more data...

Ed

On Monday February 17 2003 3:56, Laurette Cisneros wrote:
> Ok, my production server had this problem (again).  So, here's the
> info. you requested.  There's a lot of postgres processes still
> left running:
>
> postmaster (proc id 1457):
> (gdb) print Shutdown
> $1 = 2
>
> Other backends (postgres) still hanging around:
> postgres  1458  1457  0 Feb06 ?        00:22:41 postgres: stats
> buffer process InterruptPending: $1 = 0
> proc_exit_inprogress: $2 = 0
> ProcDiePending: $3 = 0
> ImmediateInterruptOK = $4 = 0
> InterruptHoldoffCount = $5 = 0
> CritSectionCount = $6 = 0
>
> postgres  1459  1458  2 Feb06 ?        05:31:38 postgres: stats
> collector process InterruptPending: $1 = 0
> proc_exit_inprogress: $2 = 0
> ProcDiePending: $3 = 0
> ImmediateInterruptOK = $4 = 0
> InterruptHoldoffCount = $5 = 0
> CritSectionCount = $6 = 0
>
> postgres  1468  1457  0 Feb06 ?        00:04:57 postgres: nbadmin
> db1 111.123.123.244 idle InterruptPending: $1 = 65537
> proc_exit_inprogress: $2 = 0
> ProcDiePending: $3 = 1
> ImmediateInterruptOK = $4 = 0
> InterruptHoldoffCount = $5 = 0
> CritSectionCount = $6 = 0
>
> postgres  1461  1457  0 Feb06 ?        00:02:06 postgres: nbadmin
> db2 111.123.123.253 idle InterruptPending: $1 = 65537
> proc_exit_inprogress: $2 = 0
> ProcDiePending: $3 = 1
> ImmediateInterruptOK: $4 = 0
> InterruptHoldoffCount: $5 = 0
> CritSectionCount: $6 = 0
>
> postgres  1478  1457  0 Feb06 ?        00:04:13 postgres: nbadmin
> db3 111.123.124.244 idle InterruptPending: $1 = 65537
> proc_exit_inprogress: $2 = 0
> ProcDiePending: $3 = 1
> ImmediateInterruptOK: $4 = 0
> InterruptHoldoffCount: $5 = 0
> CritSectionCount: $6 = 0
>
> postgres  9818  1457  0 Feb06 ?        00:00:01 postgres: nbadmin
> db4 111.123.123.244 idle InterruptPending: $1 = 65537
> proc_exit_inprogress: $2 = 0
> ProcDiePending: $3 = 1
> ImmediateInterruptOK: $4 = 0
> InterruptHoldoffCount: $5 = 0
> CritSectionCount: $6 = 0
>
> postgres 15677  1457  0 Feb06 ?        00:00:04 postgres: nbadmin
> db5 111.123.123.253 idle InterruptPending: $1 = 65537
> proc_exit_inprogress: $2 = 0
> ProcDiePending: $3 = 1
> ImmediateInterruptOK: $4 = 0
> InterruptHoldoffCount: $5 = 0
> CritSectionCount: $6 = 0
>
> postgres 30207  1457  0 Feb09 ?        00:00:00 postgres: nbadmin
> db6 111.123.123.244 idle InterruptPending: $1 = 65537
> proc_exit_inprogress: $2 = 0
> ProcDiePending: $3 = 1
> ImmediateInterruptOK: $4 = 0
> InterruptHoldoffCount: $5 = 0
> CritSectionCount: $6 = 0
>
> postgres  7316  1457  0 Feb11 ?        00:00:01 postgres: nbadmin
> db7 111.123.123.244 idle InterruptPending: $1 = 65537
> proc_exit_inprogress: $2 = 0
> ProcDiePending: $3 = 1
> ImmediateInterruptOK: $4 = 0
> InterruptHoldoffCount: $5 = 0
> CritSectionCount: $6 = 0
>
> postgres 10440  1457  0 Feb12 ?        00:00:00 postgres: nbadmin
> db8 111.123.123.253 idle InterruptPending: $1 = 65537
> proc_exit_inprogress: $2 = 0
> ProcDiePending: $3 = 1
> ImmediateInterruptOK: $4 = 0
> InterruptHoldoffCount: $5 = 0
> CritSectionCount: $6 = 0
>
> postgres 10441  1457  0 Feb12 ?        00:02:15 postgres: nbadmin
> db9 111.123.123.253 idle InterruptPending: $1 = 65537
> proc_exit_inprogress: $2 = 0
> ProcDiePending: $3 = 1
> ImmediateInterruptOK: $4 = 0
> InterruptHoldoffCount: $5 = 0
> CritSectionCount: $6 = 0
>
> postgres 27355  1457  0 Feb13 ?        00:00:00 postgres: nbadmin
> db10 111.123.123.253 idle InterruptPending: $1 = 65537
> proc_exit_inprogress: $2 = 0
> ProcDiePending: $3 = 1
> ImmediateInterruptOK: $4 = 0
> InterruptHoldoffCount: $5 = 0
> CritSectionCount: $6 = 0
>
> postgres 28491  1457  0 Feb13 ?        00:00:03 postgres: nbadmin
> db11 111.123.123.253 idle InterruptPending: $1 = 65537
> proc_exit_inprogress: $2 = 0
> ProcDiePending: $3 = 1
> ImmediateInterruptOK: $4 = 0
> InterruptHoldoffCount: $5 = 0
> CritSectionCount: $6 = 0
>
> postgres 30021  1457  0 Feb14 ?        00:00:01 postgres: nbadmin
> db12 111.123.123.244 idle InterruptPending: $1 = 65537
> proc_exit_inprogress: $2 = 0
> ProcDiePending: $3 = 1
> ImmediateInterruptOK: $4 = 0
> InterruptHoldoffCount: $5 = 0
> CritSectionCount: $6 = 0
>
> postgres 10663  1457  0 Feb14 ?        00:00:00 postgres: nbadmin
> db13 111.123.123.244 idle InterruptPending: $1 = 65537
> proc_exit_inprogress: $2 = 0
> ProcDiePending: $3 = 1
> ImmediateInterruptOK: $4 = 0
> InterruptHoldoffCount: $5 = 0
> CritSectionCount: $6 = 0
>
> postgres 10665  1457  0 Feb14 ?        00:00:00 postgres: nbadmin
> db13 111.123.123.244 idle InterruptPending: $1 = 65537
> proc_exit_inprogress: $2 = 0
> ProcDiePending: $3 = 1
> ImmediateInterruptOK: $4 = 0
> InterruptHoldoffCount: $5 = 0
> CritSectionCount: $6 = 0
>
> Thank again,
>
> Laurette
>
> On Thu, 13 Feb 2003, Tom Lane wrote:
> > Laurette Cisneros <laurette@nextbus.com> writes:
> > > Ok, it reproduced today on my development server.
> >
> > Boy, it sure looks like everything's just sitting idle waiting
> > for client input, and hasn't noticed your shutdown request at
> > all.  You sure you used "-m fast"?  Could you attach to the
> > postmaster and see what its Shutdown variable contains, and to
> > the backends and see what the variables manipulated by die()
> > contain?  (InterruptPending, proc_exit_inprogress,
> > ProcDiePending, ImmediateInterruptOK, InterruptHoldoffCount,
> > CritSectionCount)
> >
> >             regards, tom lane


Re: server won't shutdown

From
"Ed L."
Date:
On Monday February 17 2003 5:09, Tom Lane wrote:
> "Ed L." <pgadmin@bluepolka.net> writes:
> > I've been seeing the same problem intermittently over the past 2
> > years among ~30 production clusters.
>
> Hm, I suddenly have a theory.  Do you guys use LISTEN/NOTIFY in the
> databases that are exhibiting the problem?  Is it possible that the
> backends that aren't cooperating have not done anything since they
> last received a NOTIFY?  (That would imply that the connected
> frontend didn't issue any SQL commands in response to the
> notification.)

Not using LISTEN/NOTIFY here, at least not of which I'm aware.  I'll
dig a little more and let you know if I find differently.

Ed

Re: server won't shutdown

From
Laurette Cisneros
Date:
I'm compiling in the patch now and will test as soon as I can.

I have to say, I have never received such fast responses and fixes with any
"support" for any other product.

Pg is the best!

L.
On Mon, 17 Feb 2003, Tom Lane wrote:

> I said:
> > Oh, duh!  You don't need to be using NOTIFY explicitly --- you just have
> > to sit idle long enough to make the SI buffer overflow, and the system
> > will try to NOTIFY your backend anyway to make it read the SI message
> > buffer.  Which is the path I was looking at --- that will leave
> > ImmediateInterruptOK false, preventing response to SIGTERM if the client
> > still hasn't done anything by the time it comes.
>
> > Patch forthcoming.
>
> Attached is the fix against 7.3 (or CVS tip).  The same bug exists in
> 7.2 and 7.1, and should be fixable by the same patch (possibly with some
> fuzz).
>
> You can demonstrate the problem by doing 'LISTEN foo' in one psql and
> then 'NOTIFY foo' in another, then attempting a fast shutdown --- the
> first psql's backend will ignore you until the psql session does something.
> The same path can be followed without LISTEN if the first psql is simply
> left idle while the second one does a bunch of catalog-updating work
> (a few cycles of VACUUM ANALYZE usually suffice).
>
> Sigh, another day another bug...
>
>             regards, tom lane
>
> *** src/backend/commands/async.c.orig    Sun Sep 15 21:24:41 2002
> --- src/backend/commands/async.c    Mon Feb 17 21:38:47 2003
> ***************
> *** 599,604 ****
> --- 599,614 ----
>
>       if (notifyInterruptEnabled)
>       {
> +         bool        save_ImmediateInterruptOK = ImmediateInterruptOK;
> +
> +         /*
> +          * We may be called while ImmediateInterruptOK is true; turn it off
> +          * while messing with the NOTIFY state.  (We would have to save
> +          * and restore it anyway, because PGSemaphore operations inside
> +          * ProcessIncomingNotify() might reset it.)
> +          */
> +         ImmediateInterruptOK = false;
> +
>           /*
>            * I'm not sure whether some flavors of Unix might allow another
>            * SIGUSR2 occurrence to recursively interrupt this routine. To
> ***************
> *** 626,631 ****
> --- 636,648 ----
>                       elog(LOG, "Async_NotifyHandler: done");
>               }
>           }
> +
> +         /*
> +          * Restore ImmediateInterruptOK, and check for interrupts if needed.
> +          */
> +         ImmediateInterruptOK = save_ImmediateInterruptOK;
> +         if (save_ImmediateInterruptOK)
> +             CHECK_FOR_INTERRUPTS();
>       }
>       else
>       {
>

--
Laurette Cisneros, L.D.
The Database Group
(510) 420-3137
NextBus Information Systems, Inc.
www.nextbus.com
----------------------------------
"No man is wise enough by himself"
-- Titus Maccius Plautus
   (254 Bc - 184 BC), Miles Gloriosus


Re: server won't shutdown

From
"Ed L."
Date:
On Monday February 17 2003 8:06, Tom Lane wrote:
>
> You can demonstrate the problem by doing 'LISTEN foo' in one psql
> and then 'NOTIFY foo' in another, then attempting a fast shutdown
> --- the first psql's backend will ignore you until the psql session
> does something. The same path can be followed without LISTEN if the
> first psql is simply left idle while the second one does a bunch of
> catalog-updating work (a few cycles of VACUUM ANALYZE usually
> suffice).

Thanks, Tom.  Your diagnosis sounds consistent with our experience.  I
was able to reproduce it with the latter approach of running a bunch
of VACUUM ANALYZE commands.  However, I have been unable to
demonstrate this on 7.3.1 using the following sequence:

session #1 launches psql
session #2 launches psql
session #1 does "listen foo;"
session #2 does "notify foo;"
session #3 does a fast shutdown;

All backends immediately shutdown, no hanging.

Anyway, I'm applying the patch and will advise if any further issues
show up.

Ed


Re: server won't shutdown

From
"Ed L."
Date:
On Monday February 17 2003 8:06, Tom Lane wrote:
>
> Attached is the fix against 7.3 (or CVS tip).  The same bug exists
> in 7.2 and 7.1, and should be fixable by the same patch (possibly
> with some fuzz).
>
> You can demonstrate the problem by doing 'LISTEN foo' in one psql
> and then 'NOTIFY foo' in another, then attempting a fast shutdown
> --- the first psql's backend will ignore you until the psql session
> does something. The same path can be followed without LISTEN if the
> first psql is simply left idle while the second one does a bunch of
> catalog-updating work (a few cycles of VACUUM ANALYZE usually
> suffice).

I have verified this fixed the problem on 7.2.4 in the case of the
latter VACUUM ANALYZE example, and will test it also against 7.3.2
before too long.  Thanks again, Tom.

Ed


Re: server won't shutdown

From
Laurette Cisneros
Date:
Ok, I let the server run for a few days (usually long enough to get into
this situation) and then did a shutdown.  No hang.  Could be fixed.

Thanks!

L.
On Mon, 17 Feb 2003, Tom Lane wrote:

> I said:
> > Oh, duh!  You don't need to be using NOTIFY explicitly --- you just have
> > to sit idle long enough to make the SI buffer overflow, and the system
> > will try to NOTIFY your backend anyway to make it read the SI message
> > buffer.  Which is the path I was looking at --- that will leave
> > ImmediateInterruptOK false, preventing response to SIGTERM if the client
> > still hasn't done anything by the time it comes.
>
> > Patch forthcoming.
>
> Attached is the fix against 7.3 (or CVS tip).  The same bug exists in
> 7.2 and 7.1, and should be fixable by the same patch (possibly with some
> fuzz).
>
> You can demonstrate the problem by doing 'LISTEN foo' in one psql and
> then 'NOTIFY foo' in another, then attempting a fast shutdown --- the
> first psql's backend will ignore you until the psql session does something.
> The same path can be followed without LISTEN if the first psql is simply
> left idle while the second one does a bunch of catalog-updating work
> (a few cycles of VACUUM ANALYZE usually suffice).
>
> Sigh, another day another bug...
>
>             regards, tom lane
>
> *** src/backend/commands/async.c.orig    Sun Sep 15 21:24:41 2002
> --- src/backend/commands/async.c    Mon Feb 17 21:38:47 2003
> ***************
> *** 599,604 ****
> --- 599,614 ----
>
>       if (notifyInterruptEnabled)
>       {
> +         bool        save_ImmediateInterruptOK = ImmediateInterruptOK;
> +
> +         /*
> +          * We may be called while ImmediateInterruptOK is true; turn it off
> +          * while messing with the NOTIFY state.  (We would have to save
> +          * and restore it anyway, because PGSemaphore operations inside
> +          * ProcessIncomingNotify() might reset it.)
> +          */
> +         ImmediateInterruptOK = false;
> +
>           /*
>            * I'm not sure whether some flavors of Unix might allow another
>            * SIGUSR2 occurrence to recursively interrupt this routine. To
> ***************
> *** 626,631 ****
> --- 636,648 ----
>                       elog(LOG, "Async_NotifyHandler: done");
>               }
>           }
> +
> +         /*
> +          * Restore ImmediateInterruptOK, and check for interrupts if needed.
> +          */
> +         ImmediateInterruptOK = save_ImmediateInterruptOK;
> +         if (save_ImmediateInterruptOK)
> +             CHECK_FOR_INTERRUPTS();
>       }
>       else
>       {
>

--
Laurette Cisneros, L.D.
The Database Group
(510) 420-3137
NextBus Information Systems, Inc.
www.nextbus.com
----------------------------------
"No man is wise enough by himself"
-- Titus Maccius Plautus
   (254 Bc - 184 BC), Miles Gloriosus


Re: server won't shutdown

From
Laurette Cisneros
Date:
Ok, I let the server run for a few days (usually long enough to get into
this situation) and then did a shutdown.  No hang.  Could be fixed.

Thanks!

L.
On Mon, 17 Feb 2003, Tom Lane wrote:

> I said:
> > Oh, duh!  You don't need to be using NOTIFY explicitly --- you just have
> > to sit idle long enough to make the SI buffer overflow, and the system
> > will try to NOTIFY your backend anyway to make it read the SI message
> > buffer.  Which is the path I was looking at --- that will leave
> > ImmediateInterruptOK false, preventing response to SIGTERM if the client
> > still hasn't done anything by the time it comes.
>
> > Patch forthcoming.
>
> Attached is the fix against 7.3 (or CVS tip).  The same bug exists in
> 7.2 and 7.1, and should be fixable by the same patch (possibly with some
> fuzz).
>
> You can demonstrate the problem by doing 'LISTEN foo' in one psql and
> then 'NOTIFY foo' in another, then attempting a fast shutdown --- the
> first psql's backend will ignore you until the psql session does something.
> The same path can be followed without LISTEN if the first psql is simply
> left idle while the second one does a bunch of catalog-updating work
> (a few cycles of VACUUM ANALYZE usually suffice).
>
> Sigh, another day another bug...
>
>             regards, tom lane
>
> *** src/backend/commands/async.c.orig    Sun Sep 15 21:24:41 2002
> --- src/backend/commands/async.c    Mon Feb 17 21:38:47 2003
> ***************
> *** 599,604 ****
> --- 599,614 ----
>
>       if (notifyInterruptEnabled)
>       {
> +         bool        save_ImmediateInterruptOK = ImmediateInterruptOK;
> +
> +         /*
> +          * We may be called while ImmediateInterruptOK is true; turn it off
> +          * while messing with the NOTIFY state.  (We would have to save
> +          * and restore it anyway, because PGSemaphore operations inside
> +          * ProcessIncomingNotify() might reset it.)
> +          */
> +         ImmediateInterruptOK = false;
> +
>           /*
>            * I'm not sure whether some flavors of Unix might allow another
>            * SIGUSR2 occurrence to recursively interrupt this routine. To
> ***************
> *** 626,631 ****
> --- 636,648 ----
>                       elog(LOG, "Async_NotifyHandler: done");
>               }
>           }
> +
> +         /*
> +          * Restore ImmediateInterruptOK, and check for interrupts if needed.
> +          */
> +         ImmediateInterruptOK = save_ImmediateInterruptOK;
> +         if (save_ImmediateInterruptOK)
> +             CHECK_FOR_INTERRUPTS();
>       }
>       else
>       {
>

--
Laurette Cisneros, L.D.
The Database Group
(510) 420-3137
NextBus Information Systems, Inc.
www.nextbus.com
----------------------------------
"No man is wise enough by himself"
-- Titus Maccius Plautus
   (254 Bc - 184 BC), Miles Gloriosus