Thread: server won't shutdown
Server won't shutdown: Currently, we are running 7.3.2. However, we have had this problem since 7.1. Our production server has dozens of connections from different persistent applications (some web, some runnning on other systems). When I do a shutdown: $pgpath/bin/pg_ctl -D $datadir -m fast stop the server will not completely shutdown. I realize that the pg_ctl script will send a -QUIT to the postmaster process, which will then also send some kind of quit signal to the outstanding backends?, but, every time, some of the postgres backend processes will not quit and the server "hangs" in shutdown mode. To get around this, I send a -QUIT is sent to a backend process that is not doing any activity and then the server comes down (and not in a good way I'm sure). Can anyone explain *why* the server won't shutdown as well as suggest the *best* way to handle this situation? Thanks, -- Laurette Cisneros, L.D. The Database Group (510) 420-3137 NextBus Information Systems, Inc. www.nextbus.com ---------------------------------- "No man is wise enough by himself" -- Titus Maccius Plautus (254 Bc - 184 BC), Miles Gloriosus
Laurette Cisneros <laurette@nextbus.com> writes: > When I do a shutdown: > $pgpath/bin/pg_ctl -D $datadir -m fast stop > the server will not completely shutdown. Hm. -m fast tells the postmaster to send SIGTERM to all the existing backends, and then quit when all the backends have quit. So your problem is that one or more backend processes isn't exiting in a timely fashion after receiving SIGTERM. It would be useful to see what these backends are doing. Can you attach to each one with gdb and get a stack trace? regards, tom lane
Sure. I will attempt this on our development server which only occasionally has this problem. If that won't reproduce it, I will attempt this on our production server but that must be scheduled at a slow time - maybe Monday. I'll send this info. once I've collected it. Thanks for the help, L. On Wed, 12 Feb 2003, Tom Lane wrote: > Laurette Cisneros <laurette@nextbus.com> writes: > > When I do a shutdown: > > $pgpath/bin/pg_ctl -D $datadir -m fast stop > > the server will not completely shutdown. > > Hm. -m fast tells the postmaster to send SIGTERM to all the existing > backends, and then quit when all the backends have quit. So your > problem is that one or more backend processes isn't exiting in a timely > fashion after receiving SIGTERM. It would be useful to see what these > backends are doing. Can you attach to each one with gdb and get a stack > trace? > > regards, tom lane > -- Laurette Cisneros, L.D. The Database Group (510) 420-3137 NextBus Information Systems, Inc. www.nextbus.com ---------------------------------- "No man is wise enough by himself" -- Titus Maccius Plautus (254 Bc - 184 BC), Miles Gloriosus
Ok, it reproduced today on my development server. Here's the info: /usr/local/pgsql/bin/pg_ctl -D /u1/pgsql73_data -m fast -l /u1/pgsql73_data/pgsql.log stop waiting for postmaster to shut down................................................................ failed pg_ctl: postmaster does not shut down ps -efw | grep post postgres 5466 1 0 Feb12 ? 00:00:02 /usr/local/pgsql/bin/postmaster postgres 5467 5466 0 Feb12 ? 00:00:45 postgres: stats buffer process postgres 5468 5467 0 Feb12 ? 00:15:49 postgres: stats collector process postgres 8077 5466 0 Feb12 ? 00:00:00 postgres: nbadmin db1 192.168.1.111 idle postgres 23146 5466 0 09:36 ? 00:00:00 postgres: nbadmin db2 64.114.28.112 idle postgres 28874 5466 0 12:15 ? 00:00:00 postgres: nbadmin db3 208.184.214.113 idle postgres 29424 5466 0 12:30 ? 00:00:06 postgres: nbadmin db4 64.114.28.114 idle postgres 3792 5466 0 15:32 ? 00:00:00 postgres: junior jr_db 192.168.1.114 idle Here are the stack traces from each of the postmaster and postgres processes that remain: gdb /usr/local/pgsql/bin/postmaster 5466 GNU gdb 5.0rh-5 Red Hat Linux 7.1 Copyright 2001 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux"... /u1/pgsql73_data/5466: No such file or directory. Attaching to program: /usr/local/pgsql/bin/postmaster, process 5466 Reading symbols from /usr/lib/libssl.so.1...done. Loaded symbols for /usr/lib/libssl.so.1 Reading symbols from /usr/lib/libcrypto.so.1...done. Loaded symbols for /usr/lib/libcrypto.so.1 Reading symbols from /usr/lib/libz.so.1...done. Loaded symbols for /usr/lib/libz.so.1 Reading symbols from /usr/lib/libreadline.so.4.1...done. Loaded symbols for /usr/lib/libreadline.so.4.1 Reading symbols from /lib/libtermcap.so.2...done. Loaded symbols for /lib/libtermcap.so.2 Reading symbols from /lib/libcrypt.so.1...done. Loaded symbols for /lib/libcrypt.so.1 Reading symbols from /lib/libresolv.so.2...done. Loaded symbols for /lib/libresolv.so.2 Reading symbols from /lib/libnsl.so.1...done. Loaded symbols for /lib/libnsl.so.1 Reading symbols from /lib/libdl.so.2...done. Loaded symbols for /lib/libdl.so.2 Reading symbols from /lib/i686/libm.so.6...done. Loaded symbols for /lib/i686/libm.so.6 Reading symbols from /lib/i686/libc.so.6...done. Loaded symbols for /lib/i686/libc.so.6 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 Reading symbols from /lib/libnss_files.so.2...done. Loaded symbols for /lib/libnss_files.so.2 0x402ab90e in __select () from /lib/i686/libc.so.6 (gdb) bt #0 0x402ab90e in __select () from /lib/i686/libc.so.6 #1 0xbffff1b8 in ?? () #2 0x081023fb in PostmasterMain (argc=1, argv=0x822e000) at postmaster.c:788 #3 0x080e0a07 in main (argc=1, argv=0xbffff8d4) at main.c:210 #4 0x401e7177 in __libc_start_main (main=0x80e0828 <main>, argc=1, ubp_av=0xbffff8d4, init=0x806aeec <_init>, fini=0x8172770 <_fini>, rtld_fini=0x4000e184 <_dl_fini>, stack_end=0xbffff8cc) at ../sysdeps/generic/libc-start.c:129 ---------- gdb /usr/local/pgsql/bin/postgres 5467 GNU gdb 5.0rh-5 Red Hat Linux 7.1 Copyright 2001 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux"... /u1/pgsql73_data/5467: No such file or directory. Attaching to program: /usr/local/pgsql/bin/postgres, process 5467 Reading symbols from /usr/lib/libssl.so.1...done. Loaded symbols for /usr/lib/libssl.so.1 Reading symbols from /usr/lib/libcrypto.so.1...done. Loaded symbols for /usr/lib/libcrypto.so.1 Reading symbols from /usr/lib/libz.so.1...done. Loaded symbols for /usr/lib/libz.so.1 Reading symbols from /usr/lib/libreadline.so.4.1...done. Loaded symbols for /usr/lib/libreadline.so.4.1 Reading symbols from /lib/libtermcap.so.2...done. Loaded symbols for /lib/libtermcap.so.2 Reading symbols from /lib/libcrypt.so.1...done. Loaded symbols for /lib/libcrypt.so.1 Reading symbols from /lib/libresolv.so.2...done. Loaded symbols for /lib/libresolv.so.2 Reading symbols from /lib/libnsl.so.1...done. Loaded symbols for /lib/libnsl.so.1 Reading symbols from /lib/libdl.so.2...done. Loaded symbols for /lib/libdl.so.2 Reading symbols from /lib/i686/libm.so.6...done. Loaded symbols for /lib/i686/libm.so.6 Reading symbols from /lib/i686/libc.so.6...done. Loaded symbols for /lib/i686/libc.so.6 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 Reading symbols from /lib/libnss_files.so.2...done. Loaded symbols for /lib/libnss_files.so.2 0x402ab90e in __select () from /lib/i686/libc.so.6 (gdb) bt #0 0x402ab90e in __select () from /lib/i686/libc.so.6 #1 0x00000006 in __strtol_internal (nptr=0x3 <Address 0x3 out of bounds>, endptr=0x0, base=32, group=135287149) at eval.c:36 #2 0x08105189 in pgstat_main () at pgstat.c:1177 #3 0x081045f1 in pgstat_start () at pgstat.c:325 #4 0x081023c2 in PostmasterMain (argc=1, argv=0x822e000) at postmaster.c:772 #5 0x080e0a07 in main (argc=1, argv=0xbffff8d4) at main.c:210 #6 0x401e7177 in __libc_start_main (main=0x80e0828 <main>, argc=1, ubp_av=0xbffff8d4, init=0x806aeec <_init>, fini=0x8172770 <_fini>, rtld_fini=0x4000e184 <_dl_fini>, stack_end=0xbffff8cc) at ../sysdeps/generic/libc-start.c:129 ------------- gdb /usr/local/pgsql/bin/postgres 5468 GNU gdb 5.0rh-5 Red Hat Linux 7.1 Copyright 2001 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux"... /u1/pgsql73_data/5468: No such file or directory. Attaching to program: /usr/local/pgsql/bin/postgres, process 5468 Reading symbols from /usr/lib/libssl.so.1...done. Loaded symbols for /usr/lib/libssl.so.1 Reading symbols from /usr/lib/libcrypto.so.1...done. Loaded symbols for /usr/lib/libcrypto.so.1 Reading symbols from /usr/lib/libz.so.1...done. Loaded symbols for /usr/lib/libz.so.1 Reading symbols from /usr/lib/libreadline.so.4.1...done. Loaded symbols for /usr/lib/libreadline.so.4.1 Reading symbols from /lib/libtermcap.so.2...done. Loaded symbols for /lib/libtermcap.so.2 Reading symbols from /lib/libcrypt.so.1...done. Loaded symbols for /lib/libcrypt.so.1 Reading symbols from /lib/libresolv.so.2...done. Loaded symbols for /lib/libresolv.so.2 Reading symbols from /lib/libnsl.so.1...done. Loaded symbols for /lib/libnsl.so.1 Reading symbols from /lib/libdl.so.2...done. Loaded symbols for /lib/libdl.so.2 Reading symbols from /lib/i686/libm.so.6...done. Loaded symbols for /lib/i686/libm.so.6 Reading symbols from /lib/i686/libc.so.6...done. Loaded symbols for /lib/i686/libc.so.6 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 Reading symbols from /lib/libnss_files.so.2...done. Loaded symbols for /lib/libnss_files.so.2 0x402ab90e in __select () from /lib/i686/libc.so.6 (gdb) bt #0 0x402ab90e in __select () from /lib/i686/libc.so.6 #1 0xbfffef60 in ?? () #2 0x081045f1 in pgstat_start () at pgstat.c:325 #3 0x081023c2 in PostmasterMain (argc=1, argv=0x822e000) at postmaster.c:772 #4 0x080e0a07 in main (argc=1, argv=0xbffff8d4) at main.c:210 #5 0x401e7177 in __libc_start_main (main=0x80e0828 <main>, argc=1, ubp_av=0xbffff8d4, init=0x806aeec <_init>, fini=0x8172770 <_fini>, rtld_fini=0x4000e184 <_dl_fini>, stack_end=0xbffff8cc) at ../sysdeps/generic/libc-start.c:129 ------------ gdb /usr/local/pgsql/bin/postgres 8077 GNU gdb 5.0rh-5 Red Hat Linux 7.1 Copyright 2001 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux"... /u1/pgsql73_data/8077: No such file or directory. Attaching to program: /usr/local/pgsql/bin/postgres, process 8077 Reading symbols from /usr/lib/libssl.so.1...done. Loaded symbols for /usr/lib/libssl.so.1 Reading symbols from /usr/lib/libcrypto.so.1...done. Loaded symbols for /usr/lib/libcrypto.so.1 Reading symbols from /usr/lib/libz.so.1...done. Loaded symbols for /usr/lib/libz.so.1 Reading symbols from /usr/lib/libreadline.so.4.1...done. Loaded symbols for /usr/lib/libreadline.so.4.1 Reading symbols from /lib/libtermcap.so.2...done. Loaded symbols for /lib/libtermcap.so.2 Reading symbols from /lib/libcrypt.so.1...done. Loaded symbols for /lib/libcrypt.so.1 Reading symbols from /lib/libresolv.so.2...done. Loaded symbols for /lib/libresolv.so.2 Reading symbols from /lib/libnsl.so.1...done. Loaded symbols for /lib/libnsl.so.1 Reading symbols from /lib/libdl.so.2...done. Loaded symbols for /lib/libdl.so.2 Reading symbols from /lib/i686/libm.so.6...done. Loaded symbols for /lib/i686/libm.so.6 Reading symbols from /lib/i686/libc.so.6...done. Loaded symbols for /lib/i686/libc.so.6 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 Reading symbols from /lib/libnss_files.so.2...done. Loaded symbols for /lib/libnss_files.so.2 Reading symbols from /usr/local/pgsql-7.3.2/lib/plpgsql.so...done. Loaded symbols for /usr/local/pgsql-7.3.2/lib/plpgsql.so 0x402b2262 in __libc_recv () from /lib/i686/libc.so.6 (gdb) bt #0 0x402b2262 in __libc_recv () from /lib/i686/libc.so.6 #1 0x080dc8bd in secure_read (port=0x8247110, ptr=0x81f7c40, len=8192) at be-secure.c:301 #2 0x080dff4d in pq_recvbuf () at pqcomm.c:463 #3 0x080dff9d in pq_getbyte () at pqcomm.c:500 #4 0x0811a675 in SocketBackend (inBuf=0x82b7b50) at postgres.c:247 #5 0x0811a6eb in ReadCommand (inBuf=0x82b7b50) at postgres.c:304 #6 0x0811bbb9 in PostgresMain (argc=4, argv=0xbfffef40, username=0x8247241 "nbadmin") at postgres.c:1930 #7 0x08103d60 in DoBackend (port=0x8247110) at postmaster.c:2302 #8 0x081036a6 in BackendStartup (port=0x8247110) at postmaster.c:1924 #9 0x081028a5 in ServerLoop () at postmaster.c:1027 #10 0x081023fb in PostmasterMain (argc=1, argv=0x822e000) at postmaster.c:788 #11 0x080e0a07 in main (argc=1, argv=0xbffff8d4) at main.c:210 #12 0x401e7177 in __libc_start_main (main=0x80e0828 <main>, argc=1, ubp_av=0xbffff8d4, init=0x806aeec <_init>, fini=0x8172770 <_fini>, rtld_fini=0x4000e184 <_dl_fini>, stack_end=0xbffff8cc) at ../sysdeps/generic/libc-start.c:129 -------- gdb /usr/local/pgsql/bin/postgres 23146 GNU gdb 5.0rh-5 Red Hat Linux 7.1 Copyright 2001 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux"... /u1/pgsql73_data/23146: No such file or directory. Attaching to program: /usr/local/pgsql/bin/postgres, process 23146 Reading symbols from /usr/lib/libssl.so.1...done. Loaded symbols for /usr/lib/libssl.so.1 Reading symbols from /usr/lib/libcrypto.so.1...done. Loaded symbols for /usr/lib/libcrypto.so.1 Reading symbols from /usr/lib/libz.so.1...done. Loaded symbols for /usr/lib/libz.so.1 Reading symbols from /usr/lib/libreadline.so.4.1...done. Loaded symbols for /usr/lib/libreadline.so.4.1 Reading symbols from /lib/libtermcap.so.2...done. Loaded symbols for /lib/libtermcap.so.2 Reading symbols from /lib/libcrypt.so.1...done. Loaded symbols for /lib/libcrypt.so.1 Reading symbols from /lib/libresolv.so.2...done. Loaded symbols for /lib/libresolv.so.2 Reading symbols from /lib/libnsl.so.1...done. Loaded symbols for /lib/libnsl.so.1 Reading symbols from /lib/libdl.so.2...done. Loaded symbols for /lib/libdl.so.2 Reading symbols from /lib/i686/libm.so.6...done. Loaded symbols for /lib/i686/libm.so.6 Reading symbols from /lib/i686/libc.so.6...done. Loaded symbols for /lib/i686/libc.so.6 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 Reading symbols from /lib/libnss_files.so.2...done. Loaded symbols for /lib/libnss_files.so.2 Reading symbols from /usr/local/pgsql-7.3.2/lib/plpgsql.so...done. Loaded symbols for /usr/local/pgsql-7.3.2/lib/plpgsql.so 0x402b2262 in __libc_recv () from /lib/i686/libc.so.6 (gdb) bt #0 0x402b2262 in __libc_recv () from /lib/i686/libc.so.6 #1 0x080dc8bd in secure_read (port=0x8247110, ptr=0x81f7c40, len=8192) at be-secure.c:301 #2 0x080dff4d in pq_recvbuf () at pqcomm.c:463 #3 0x080dff9d in pq_getbyte () at pqcomm.c:500 #4 0x0811a675 in SocketBackend (inBuf=0x82b7b78) at postgres.c:247 #5 0x0811a6eb in ReadCommand (inBuf=0x82b7b78) at postgres.c:304 #6 0x0811bbb9 in PostgresMain (argc=4, argv=0xbfffef40, username=0x8247241 "nbadmin") at postgres.c:1930 #7 0x08103d60 in DoBackend (port=0x8247110) at postmaster.c:2302 #8 0x081036a6 in BackendStartup (port=0x8247110) at postmaster.c:1924 #9 0x081028a5 in ServerLoop () at postmaster.c:1027 #10 0x081023fb in PostmasterMain (argc=1, argv=0x822e000) at postmaster.c:788 #11 0x080e0a07 in main (argc=1, argv=0xbffff8d4) at main.c:210 #12 0x401e7177 in __libc_start_main (main=0x80e0828 <main>, argc=1, ubp_av=0xbffff8d4, init=0x806aeec <_init>, fini=0x8172770 <_fini>, rtld_fini=0x4000e184 <_dl_fini>, stack_end=0xbffff8cc) at ../sysdeps/generic/libc-start.c:129 ---------- gdb /usr/local/pgsql/bin/postgres 28874 GNU gdb 5.0rh-5 Red Hat Linux 7.1 Copyright 2001 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux"... /u1/pgsql73_data/28874: No such file or directory. Attaching to program: /usr/local/pgsql/bin/postgres, process 28874 Reading symbols from /usr/lib/libssl.so.1...done. Loaded symbols for /usr/lib/libssl.so.1 Reading symbols from /usr/lib/libcrypto.so.1...done. Loaded symbols for /usr/lib/libcrypto.so.1 Reading symbols from /usr/lib/libz.so.1...done. Loaded symbols for /usr/lib/libz.so.1 Reading symbols from /usr/lib/libreadline.so.4.1...done. Loaded symbols for /usr/lib/libreadline.so.4.1 Reading symbols from /lib/libtermcap.so.2...done. Loaded symbols for /lib/libtermcap.so.2 Reading symbols from /lib/libcrypt.so.1...done. Loaded symbols for /lib/libcrypt.so.1 Reading symbols from /lib/libresolv.so.2...done. Loaded symbols for /lib/libresolv.so.2 Reading symbols from /lib/libnsl.so.1...done. Loaded symbols for /lib/libnsl.so.1 Reading symbols from /lib/libdl.so.2...done. Loaded symbols for /lib/libdl.so.2 Reading symbols from /lib/i686/libm.so.6...done. Loaded symbols for /lib/i686/libm.so.6 Reading symbols from /lib/i686/libc.so.6...done. Loaded symbols for /lib/i686/libc.so.6 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 Reading symbols from /lib/libnss_files.so.2...done. Loaded symbols for /lib/libnss_files.so.2 Reading symbols from /usr/local/pgsql-7.3.2/lib/plpgsql.so...done. Loaded symbols for /usr/local/pgsql-7.3.2/lib/plpgsql.so Reading symbols from /usr/local/pgsql-7.3.2/lib/nextbus.so...done. Loaded symbols for /usr/local/pgsql-7.3.2/lib/nextbus.so 0x402b2262 in __libc_recv () from /lib/i686/libc.so.6 (gdb) bt #0 0x402b2262 in __libc_recv () from /lib/i686/libc.so.6 #1 0x080dc8bd in secure_read (port=0x8247110, ptr=0x81f7c40, len=8192) at be-secure.c:301 #2 0x080dff4d in pq_recvbuf () at pqcomm.c:463 #3 0x080dff9d in pq_getbyte () at pqcomm.c:500 #4 0x0811a675 in SocketBackend (inBuf=0x82b7c40) at postgres.c:247 #5 0x0811a6eb in ReadCommand (inBuf=0x82b7c40) at postgres.c:304 #6 0x0811bbb9 in PostgresMain (argc=4, argv=0xbfffef40, username=0x8247241 "nbadmin") at postgres.c:1930 #7 0x08103d60 in DoBackend (port=0x8247110) at postmaster.c:2302 #8 0x081036a6 in BackendStartup (port=0x8247110) at postmaster.c:1924 #9 0x081028a5 in ServerLoop () at postmaster.c:1027 #10 0x081023fb in PostmasterMain (argc=1, argv=0x822e000) at postmaster.c:788 #11 0x080e0a07 in main (argc=1, argv=0xbffff8d4) at main.c:210 #12 0x401e7177 in __libc_start_main (main=0x80e0828 <main>, argc=1, ubp_av=0xbffff8d4, init=0x806aeec <_init>, fini=0x8172770 <_fini>, rtld_fini=0x4000e184 <_dl_fini>, stack_end=0xbffff8cc) at ../sysdeps/generic/libc-start.c:129 --------- gdb /usr/local/pgsql/bin/postgres 29424 GNU gdb 5.0rh-5 Red Hat Linux 7.1 Copyright 2001 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux"... /u1/pgsql73_data/29424: No such file or directory. Attaching to program: /usr/local/pgsql/bin/postgres, process 29424 Reading symbols from /usr/lib/libssl.so.1...done. Loaded symbols for /usr/lib/libssl.so.1 Reading symbols from /usr/lib/libcrypto.so.1...done. Loaded symbols for /usr/lib/libcrypto.so.1 Reading symbols from /usr/lib/libz.so.1...done. Loaded symbols for /usr/lib/libz.so.1 Reading symbols from /usr/lib/libreadline.so.4.1...done. Loaded symbols for /usr/lib/libreadline.so.4.1 Reading symbols from /lib/libtermcap.so.2...done. Loaded symbols for /lib/libtermcap.so.2 Reading symbols from /lib/libcrypt.so.1...done. Loaded symbols for /lib/libcrypt.so.1 Reading symbols from /lib/libresolv.so.2...done. Loaded symbols for /lib/libresolv.so.2 Reading symbols from /lib/libnsl.so.1...done. Loaded symbols for /lib/libnsl.so.1 Reading symbols from /lib/libdl.so.2...done. Loaded symbols for /lib/libdl.so.2 Reading symbols from /lib/i686/libm.so.6...done. Loaded symbols for /lib/i686/libm.so.6 Reading symbols from /lib/i686/libc.so.6...done. Loaded symbols for /lib/i686/libc.so.6 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 Reading symbols from /lib/libnss_files.so.2...done. Loaded symbols for /lib/libnss_files.so.2 Reading symbols from /usr/local/pgsql-7.3.2/lib/plpgsql.so...done. Loaded symbols for /usr/local/pgsql-7.3.2/lib/plpgsql.so Reading symbols from /usr/local/pgsql-7.3.2/lib/nextbus.so...done. Loaded symbols for /usr/local/pgsql-7.3.2/lib/nextbus.so 0x402b2262 in __libc_recv () from /lib/i686/libc.so.6 (gdb) bt #0 0x402b2262 in __libc_recv () from /lib/i686/libc.so.6 #1 0x080dc8bd in secure_read (port=0x8247110, ptr=0x81f7c40, len=8192) at be-secure.c:301 #2 0x080dff4d in pq_recvbuf () at pqcomm.c:463 #3 0x080dff9d in pq_getbyte () at pqcomm.c:500 #4 0x0811a675 in SocketBackend (inBuf=0x82b7cb8) at postgres.c:247 #5 0x0811a6eb in ReadCommand (inBuf=0x82b7cb8) at postgres.c:304 #6 0x0811bbb9 in PostgresMain (argc=4, argv=0xbfffef40, username=0x8247241 "nbadmin") at postgres.c:1930 #7 0x08103d60 in DoBackend (port=0x8247110) at postmaster.c:2302 #8 0x081036a6 in BackendStartup (port=0x8247110) at postmaster.c:1924 #9 0x081028a5 in ServerLoop () at postmaster.c:1027 #10 0x081023fb in PostmasterMain (argc=1, argv=0x822e000) at postmaster.c:788 #11 0x080e0a07 in main (argc=1, argv=0xbffff8d4) at main.c:210 #12 0x401e7177 in __libc_start_main (main=0x80e0828 <main>, argc=1, ubp_av=0xbffff8d4, init=0x806aeec <_init>, fini=0x8172770 <_fini>, rtld_fini=0x4000e184 <_dl_fini>, stack_end=0xbffff8cc) at ../sysdeps/generic/libc-start.c:129 --------- gdb /usr/local/pgsql/bin/postgres 3792 GNU gdb 5.0rh-5 Red Hat Linux 7.1 Copyright 2001 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux"... /u1/pgsql73_data/3792: No such file or directory. Attaching to program: /usr/local/pgsql/bin/postgres, process 3792 Reading symbols from /usr/lib/libssl.so.1...done. Loaded symbols for /usr/lib/libssl.so.1 Reading symbols from /usr/lib/libcrypto.so.1...done. Loaded symbols for /usr/lib/libcrypto.so.1 Reading symbols from /usr/lib/libz.so.1...done. Loaded symbols for /usr/lib/libz.so.1 Reading symbols from /usr/lib/libreadline.so.4.1...done. Loaded symbols for /usr/lib/libreadline.so.4.1 Reading symbols from /lib/libtermcap.so.2...done. Loaded symbols for /lib/libtermcap.so.2 Reading symbols from /lib/libcrypt.so.1...done. Loaded symbols for /lib/libcrypt.so.1 Reading symbols from /lib/libresolv.so.2...done. Loaded symbols for /lib/libresolv.so.2 Reading symbols from /lib/libnsl.so.1...done. Loaded symbols for /lib/libnsl.so.1 Reading symbols from /lib/libdl.so.2...done. Loaded symbols for /lib/libdl.so.2 Reading symbols from /lib/i686/libm.so.6...done. Loaded symbols for /lib/i686/libm.so.6 Reading symbols from /lib/i686/libc.so.6...done. Loaded symbols for /lib/i686/libc.so.6 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 Reading symbols from /lib/libnss_files.so.2...done. Loaded symbols for /lib/libnss_files.so.2 0x402b2262 in __libc_recv () from /lib/i686/libc.so.6 (gdb) bt #0 0x402b2262 in __libc_recv () from /lib/i686/libc.so.6 #1 0x080dc8bd in secure_read (port=0x8247110, ptr=0x81f7c40, len=8192) at be-secure.c:301 #2 0x080dff4d in pq_recvbuf () at pqcomm.c:463 #3 0x080dff9d in pq_getbyte () at pqcomm.c:500 #4 0x0811a675 in SocketBackend (inBuf=0x82b7958) at postgres.c:247 #5 0x0811a6eb in ReadCommand (inBuf=0x82b7958) at postgres.c:304 #6 0x0811bbb9 in PostgresMain (argc=4, argv=0xbfffef40, username=0x8247241 "jshields") at postgres.c:1930 #7 0x08103d60 in DoBackend (port=0x8247110) at postmaster.c:2302 #8 0x081036a6 in BackendStartup (port=0x8247110) at postmaster.c:1924 #9 0x081028a5 in ServerLoop () at postmaster.c:1027 #10 0x081023fb in PostmasterMain (argc=1, argv=0x822e000) at postmaster.c:788 #11 0x080e0a07 in main (argc=1, argv=0xbffff8d4) at main.c:210 #12 0x401e7177 in __libc_start_main (main=0x80e0828 <main>, argc=1, ubp_av=0xbffff8d4, init=0x806aeec <_init>, fini=0x8172770 <_fini>, rtld_fini=0x4000e184 <_dl_fini>, stack_end=0xbffff8cc) at ../sysdeps/generic/libc-start.c:129 Thanks! L. On Wed, 12 Feb 2003, Tom Lane wrote: > Laurette Cisneros <laurette@nextbus.com> writes: > > When I do a shutdown: > > $pgpath/bin/pg_ctl -D $datadir -m fast stop > > the server will not completely shutdown. > > Hm. -m fast tells the postmaster to send SIGTERM to all the existing > backends, and then quit when all the backends have quit. So your > problem is that one or more backend processes isn't exiting in a timely > fashion after receiving SIGTERM. It would be useful to see what these > backends are doing. Can you attach to each one with gdb and get a stack > trace? > > regards, tom lane > -- Laurette Cisneros, L.D. The Database Group (510) 420-3137 NextBus Information Systems, Inc. www.nextbus.com ---------------------------------- "No man is wise enough by himself" -- Titus Maccius Plautus (254 Bc - 184 BC), Miles Gloriosus
Laurette Cisneros <laurette@nextbus.com> writes: > Ok, it reproduced today on my development server. Boy, it sure looks like everything's just sitting idle waiting for client input, and hasn't noticed your shutdown request at all. You sure you used "-m fast"? Could you attach to the postmaster and see what its Shutdown variable contains, and to the backends and see what the variables manipulated by die() contain? (InterruptPending, proc_exit_inprogress, ProcDiePending, ImmediateInterruptOK, InterruptHoldoffCount, CritSectionCount) regards, tom lane
Sure, next time it happens. I'm back up again. On Thu, 13 Feb 2003, Tom Lane wrote: > Laurette Cisneros <laurette@nextbus.com> writes: > > Ok, it reproduced today on my development server. > > Boy, it sure looks like everything's just sitting idle waiting for > client input, and hasn't noticed your shutdown request at all. You > sure you used "-m fast"? Could you attach to the postmaster and see > what its Shutdown variable contains, and to the backends and see what > the variables manipulated by die() contain? (InterruptPending, > proc_exit_inprogress, ProcDiePending, ImmediateInterruptOK, > InterruptHoldoffCount, CritSectionCount) > > regards, tom lane > -- Laurette Cisneros, L.D. The Database Group (510) 420-3137 NextBus Information Systems, Inc. www.nextbus.com ---------------------------------- "No man is wise enough by himself" -- Titus Maccius Plautus (254 Bc - 184 BC), Miles Gloriosus
Ok, my production server had this problem (again). So, here's the info. you requested. There's a lot of postgres processes still left running: postmaster (proc id 1457): (gdb) print Shutdown $1 = 2 Other backends (postgres) still hanging around: postgres 1458 1457 0 Feb06 ? 00:22:41 postgres: stats buffer process InterruptPending: $1 = 0 proc_exit_inprogress: $2 = 0 ProcDiePending: $3 = 0 ImmediateInterruptOK = $4 = 0 InterruptHoldoffCount = $5 = 0 CritSectionCount = $6 = 0 postgres 1459 1458 2 Feb06 ? 05:31:38 postgres: stats collector process InterruptPending: $1 = 0 proc_exit_inprogress: $2 = 0 ProcDiePending: $3 = 0 ImmediateInterruptOK = $4 = 0 InterruptHoldoffCount = $5 = 0 CritSectionCount = $6 = 0 postgres 1468 1457 0 Feb06 ? 00:04:57 postgres: nbadmin db1 111.123.123.244 idle InterruptPending: $1 = 65537 proc_exit_inprogress: $2 = 0 ProcDiePending: $3 = 1 ImmediateInterruptOK = $4 = 0 InterruptHoldoffCount = $5 = 0 CritSectionCount = $6 = 0 postgres 1461 1457 0 Feb06 ? 00:02:06 postgres: nbadmin db2 111.123.123.253 idle InterruptPending: $1 = 65537 proc_exit_inprogress: $2 = 0 ProcDiePending: $3 = 1 ImmediateInterruptOK: $4 = 0 InterruptHoldoffCount: $5 = 0 CritSectionCount: $6 = 0 postgres 1478 1457 0 Feb06 ? 00:04:13 postgres: nbadmin db3 111.123.124.244 idle InterruptPending: $1 = 65537 proc_exit_inprogress: $2 = 0 ProcDiePending: $3 = 1 ImmediateInterruptOK: $4 = 0 InterruptHoldoffCount: $5 = 0 CritSectionCount: $6 = 0 postgres 9818 1457 0 Feb06 ? 00:00:01 postgres: nbadmin db4 111.123.123.244 idle InterruptPending: $1 = 65537 proc_exit_inprogress: $2 = 0 ProcDiePending: $3 = 1 ImmediateInterruptOK: $4 = 0 InterruptHoldoffCount: $5 = 0 CritSectionCount: $6 = 0 postgres 15677 1457 0 Feb06 ? 00:00:04 postgres: nbadmin db5 111.123.123.253 idle InterruptPending: $1 = 65537 proc_exit_inprogress: $2 = 0 ProcDiePending: $3 = 1 ImmediateInterruptOK: $4 = 0 InterruptHoldoffCount: $5 = 0 CritSectionCount: $6 = 0 postgres 30207 1457 0 Feb09 ? 00:00:00 postgres: nbadmin db6 111.123.123.244 idle InterruptPending: $1 = 65537 proc_exit_inprogress: $2 = 0 ProcDiePending: $3 = 1 ImmediateInterruptOK: $4 = 0 InterruptHoldoffCount: $5 = 0 CritSectionCount: $6 = 0 postgres 7316 1457 0 Feb11 ? 00:00:01 postgres: nbadmin db7 111.123.123.244 idle InterruptPending: $1 = 65537 proc_exit_inprogress: $2 = 0 ProcDiePending: $3 = 1 ImmediateInterruptOK: $4 = 0 InterruptHoldoffCount: $5 = 0 CritSectionCount: $6 = 0 postgres 10440 1457 0 Feb12 ? 00:00:00 postgres: nbadmin db8 111.123.123.253 idle InterruptPending: $1 = 65537 proc_exit_inprogress: $2 = 0 ProcDiePending: $3 = 1 ImmediateInterruptOK: $4 = 0 InterruptHoldoffCount: $5 = 0 CritSectionCount: $6 = 0 postgres 10441 1457 0 Feb12 ? 00:02:15 postgres: nbadmin db9 111.123.123.253 idle InterruptPending: $1 = 65537 proc_exit_inprogress: $2 = 0 ProcDiePending: $3 = 1 ImmediateInterruptOK: $4 = 0 InterruptHoldoffCount: $5 = 0 CritSectionCount: $6 = 0 postgres 27355 1457 0 Feb13 ? 00:00:00 postgres: nbadmin db10 111.123.123.253 idle InterruptPending: $1 = 65537 proc_exit_inprogress: $2 = 0 ProcDiePending: $3 = 1 ImmediateInterruptOK: $4 = 0 InterruptHoldoffCount: $5 = 0 CritSectionCount: $6 = 0 postgres 28491 1457 0 Feb13 ? 00:00:03 postgres: nbadmin db11 111.123.123.253 idle InterruptPending: $1 = 65537 proc_exit_inprogress: $2 = 0 ProcDiePending: $3 = 1 ImmediateInterruptOK: $4 = 0 InterruptHoldoffCount: $5 = 0 CritSectionCount: $6 = 0 postgres 30021 1457 0 Feb14 ? 00:00:01 postgres: nbadmin db12 111.123.123.244 idle InterruptPending: $1 = 65537 proc_exit_inprogress: $2 = 0 ProcDiePending: $3 = 1 ImmediateInterruptOK: $4 = 0 InterruptHoldoffCount: $5 = 0 CritSectionCount: $6 = 0 postgres 10663 1457 0 Feb14 ? 00:00:00 postgres: nbadmin db13 111.123.123.244 idle InterruptPending: $1 = 65537 proc_exit_inprogress: $2 = 0 ProcDiePending: $3 = 1 ImmediateInterruptOK: $4 = 0 InterruptHoldoffCount: $5 = 0 CritSectionCount: $6 = 0 postgres 10665 1457 0 Feb14 ? 00:00:00 postgres: nbadmin db13 111.123.123.244 idle InterruptPending: $1 = 65537 proc_exit_inprogress: $2 = 0 ProcDiePending: $3 = 1 ImmediateInterruptOK: $4 = 0 InterruptHoldoffCount: $5 = 0 CritSectionCount: $6 = 0 Thank again, Laurette On Thu, 13 Feb 2003, Tom Lane wrote: > Laurette Cisneros <laurette@nextbus.com> writes: > > Ok, it reproduced today on my development server. > > Boy, it sure looks like everything's just sitting idle waiting for > client input, and hasn't noticed your shutdown request at all. You > sure you used "-m fast"? Could you attach to the postmaster and see > what its Shutdown variable contains, and to the backends and see what > the variables manipulated by die() contain? (InterruptPending, > proc_exit_inprogress, ProcDiePending, ImmediateInterruptOK, > InterruptHoldoffCount, CritSectionCount) > > regards, tom lane > -- Laurette Cisneros, L.D. The Database Group (510) 420-3137 NextBus Information Systems, Inc. www.nextbus.com ---------------------------------- "No man is wise enough by himself" -- Titus Maccius Plautus (254 Bc - 184 BC), Miles Gloriosus
Laurette Cisneros <laurette@nextbus.com> writes: > Ok, my production server had this problem (again). So, here's the info. > you requested. There's a lot of postgres processes still left running: > postmaster (proc id 1457): > (gdb) print Shutdown > $1 = 2 Okay, so the postmaster definitely got the FastShutdown request ... > postgres 1468 1457 0 Feb06 ? 00:04:57 postgres: nbadmin db1 111.123.123.244 idle > InterruptPending: $1 = 65537 > proc_exit_inprogress: $2 = 0 > ProcDiePending: $3 = 1 > ImmediateInterruptOK = $4 = 0 > InterruptHoldoffCount = $5 = 0 > CritSectionCount = $6 = 0 ... and it correctly sent it on to the backends, because they all have ProcDiePending true, which proves that the die() signal handler did run. So why are they sitting waiting for input? They should have exited, either instantly in die() if it occurred while they were already waiting for input, or at the CHECK_FOR_INTERRUPTS() just before they next called ReadCommand(). I can't see any window for failure there. I'm baffled. The only thing I can see to investigate is that InterruptPending printed out as 65537 in each backend. It's declared as bool, and only ever set to true or false, so I'd have expected a result of 0 or 1 --- in fact, since bool is char, it's impossible for it to contain a value of 65537. Is gdb confused, or is there something fishy there? BTW, do you have any idea whether any backends *did* exit? Is this all of the ones that were running, or did some obey the shutdown signal? (You could look for "This connection has been terminated by the administrator." errors in the postmaster log to see if any did obey.) regards, tom lane
"Ed L." <pgadmin@bluepolka.net> writes: > I've been seeing the same problem intermittently over the past 2 years > among ~30 production clusters. Hm, I suddenly have a theory. Do you guys use LISTEN/NOTIFY in the databases that are exhibiting the problem? Is it possible that the backends that aren't cooperating have not done anything since they last received a NOTIFY? (That would imply that the connected frontend didn't issue any SQL commands in response to the notification.) regards, tom lane
"Ed L." <pggeneral@bluepolka.net> writes: > On Monday February 17 2003 5:09, Tom Lane wrote: >> Hm, I suddenly have a theory. Do you guys use LISTEN/NOTIFY in the >> databases that are exhibiting the problem? > Not using LISTEN/NOTIFY here, at least not of which I'm aware. I'll > dig a little more and let you know if I find differently. Oh, duh! You don't need to be using NOTIFY explicitly --- you just have to sit idle long enough to make the SI buffer overflow, and the system will try to NOTIFY your backend anyway to make it read the SI message buffer. Which is the path I was looking at --- that will leave ImmediateInterruptOK false, preventing response to SIGTERM if the client still hasn't done anything by the time it comes. Patch forthcoming. regards, tom lane
I said: > Oh, duh! You don't need to be using NOTIFY explicitly --- you just have > to sit idle long enough to make the SI buffer overflow, and the system > will try to NOTIFY your backend anyway to make it read the SI message > buffer. Which is the path I was looking at --- that will leave > ImmediateInterruptOK false, preventing response to SIGTERM if the client > still hasn't done anything by the time it comes. > Patch forthcoming. Attached is the fix against 7.3 (or CVS tip). The same bug exists in 7.2 and 7.1, and should be fixable by the same patch (possibly with some fuzz). You can demonstrate the problem by doing 'LISTEN foo' in one psql and then 'NOTIFY foo' in another, then attempting a fast shutdown --- the first psql's backend will ignore you until the psql session does something. The same path can be followed without LISTEN if the first psql is simply left idle while the second one does a bunch of catalog-updating work (a few cycles of VACUUM ANALYZE usually suffice). Sigh, another day another bug... regards, tom lane *** src/backend/commands/async.c.orig Sun Sep 15 21:24:41 2002 --- src/backend/commands/async.c Mon Feb 17 21:38:47 2003 *************** *** 599,604 **** --- 599,614 ---- if (notifyInterruptEnabled) { + bool save_ImmediateInterruptOK = ImmediateInterruptOK; + + /* + * We may be called while ImmediateInterruptOK is true; turn it off + * while messing with the NOTIFY state. (We would have to save + * and restore it anyway, because PGSemaphore operations inside + * ProcessIncomingNotify() might reset it.) + */ + ImmediateInterruptOK = false; + /* * I'm not sure whether some flavors of Unix might allow another * SIGUSR2 occurrence to recursively interrupt this routine. To *************** *** 626,631 **** --- 636,648 ---- elog(LOG, "Async_NotifyHandler: done"); } } + + /* + * Restore ImmediateInterruptOK, and check for interrupts if needed. + */ + ImmediateInterruptOK = save_ImmediateInterruptOK; + if (save_ImmediateInterruptOK) + CHECK_FOR_INTERRUPTS(); } else {
I've been seeing the same problem intermittently over the past 2 years among ~30 production clusters. Today I saw it happen with 3-4 of the 7.2.3 clusters on the same host (PostgreSQL 7.2.3 on i686-pc-linux-gnu, compiled by GCC 2.96 with 2.4.18-4bigmem kernel). Each cluster had a number of long-idle connections, no current activity. Will try to gather a little more data... Ed On Monday February 17 2003 3:56, Laurette Cisneros wrote: > Ok, my production server had this problem (again). So, here's the > info. you requested. There's a lot of postgres processes still > left running: > > postmaster (proc id 1457): > (gdb) print Shutdown > $1 = 2 > > Other backends (postgres) still hanging around: > postgres 1458 1457 0 Feb06 ? 00:22:41 postgres: stats > buffer process InterruptPending: $1 = 0 > proc_exit_inprogress: $2 = 0 > ProcDiePending: $3 = 0 > ImmediateInterruptOK = $4 = 0 > InterruptHoldoffCount = $5 = 0 > CritSectionCount = $6 = 0 > > postgres 1459 1458 2 Feb06 ? 05:31:38 postgres: stats > collector process InterruptPending: $1 = 0 > proc_exit_inprogress: $2 = 0 > ProcDiePending: $3 = 0 > ImmediateInterruptOK = $4 = 0 > InterruptHoldoffCount = $5 = 0 > CritSectionCount = $6 = 0 > > postgres 1468 1457 0 Feb06 ? 00:04:57 postgres: nbadmin > db1 111.123.123.244 idle InterruptPending: $1 = 65537 > proc_exit_inprogress: $2 = 0 > ProcDiePending: $3 = 1 > ImmediateInterruptOK = $4 = 0 > InterruptHoldoffCount = $5 = 0 > CritSectionCount = $6 = 0 > > postgres 1461 1457 0 Feb06 ? 00:02:06 postgres: nbadmin > db2 111.123.123.253 idle InterruptPending: $1 = 65537 > proc_exit_inprogress: $2 = 0 > ProcDiePending: $3 = 1 > ImmediateInterruptOK: $4 = 0 > InterruptHoldoffCount: $5 = 0 > CritSectionCount: $6 = 0 > > postgres 1478 1457 0 Feb06 ? 00:04:13 postgres: nbadmin > db3 111.123.124.244 idle InterruptPending: $1 = 65537 > proc_exit_inprogress: $2 = 0 > ProcDiePending: $3 = 1 > ImmediateInterruptOK: $4 = 0 > InterruptHoldoffCount: $5 = 0 > CritSectionCount: $6 = 0 > > postgres 9818 1457 0 Feb06 ? 00:00:01 postgres: nbadmin > db4 111.123.123.244 idle InterruptPending: $1 = 65537 > proc_exit_inprogress: $2 = 0 > ProcDiePending: $3 = 1 > ImmediateInterruptOK: $4 = 0 > InterruptHoldoffCount: $5 = 0 > CritSectionCount: $6 = 0 > > postgres 15677 1457 0 Feb06 ? 00:00:04 postgres: nbadmin > db5 111.123.123.253 idle InterruptPending: $1 = 65537 > proc_exit_inprogress: $2 = 0 > ProcDiePending: $3 = 1 > ImmediateInterruptOK: $4 = 0 > InterruptHoldoffCount: $5 = 0 > CritSectionCount: $6 = 0 > > postgres 30207 1457 0 Feb09 ? 00:00:00 postgres: nbadmin > db6 111.123.123.244 idle InterruptPending: $1 = 65537 > proc_exit_inprogress: $2 = 0 > ProcDiePending: $3 = 1 > ImmediateInterruptOK: $4 = 0 > InterruptHoldoffCount: $5 = 0 > CritSectionCount: $6 = 0 > > postgres 7316 1457 0 Feb11 ? 00:00:01 postgres: nbadmin > db7 111.123.123.244 idle InterruptPending: $1 = 65537 > proc_exit_inprogress: $2 = 0 > ProcDiePending: $3 = 1 > ImmediateInterruptOK: $4 = 0 > InterruptHoldoffCount: $5 = 0 > CritSectionCount: $6 = 0 > > postgres 10440 1457 0 Feb12 ? 00:00:00 postgres: nbadmin > db8 111.123.123.253 idle InterruptPending: $1 = 65537 > proc_exit_inprogress: $2 = 0 > ProcDiePending: $3 = 1 > ImmediateInterruptOK: $4 = 0 > InterruptHoldoffCount: $5 = 0 > CritSectionCount: $6 = 0 > > postgres 10441 1457 0 Feb12 ? 00:02:15 postgres: nbadmin > db9 111.123.123.253 idle InterruptPending: $1 = 65537 > proc_exit_inprogress: $2 = 0 > ProcDiePending: $3 = 1 > ImmediateInterruptOK: $4 = 0 > InterruptHoldoffCount: $5 = 0 > CritSectionCount: $6 = 0 > > postgres 27355 1457 0 Feb13 ? 00:00:00 postgres: nbadmin > db10 111.123.123.253 idle InterruptPending: $1 = 65537 > proc_exit_inprogress: $2 = 0 > ProcDiePending: $3 = 1 > ImmediateInterruptOK: $4 = 0 > InterruptHoldoffCount: $5 = 0 > CritSectionCount: $6 = 0 > > postgres 28491 1457 0 Feb13 ? 00:00:03 postgres: nbadmin > db11 111.123.123.253 idle InterruptPending: $1 = 65537 > proc_exit_inprogress: $2 = 0 > ProcDiePending: $3 = 1 > ImmediateInterruptOK: $4 = 0 > InterruptHoldoffCount: $5 = 0 > CritSectionCount: $6 = 0 > > postgres 30021 1457 0 Feb14 ? 00:00:01 postgres: nbadmin > db12 111.123.123.244 idle InterruptPending: $1 = 65537 > proc_exit_inprogress: $2 = 0 > ProcDiePending: $3 = 1 > ImmediateInterruptOK: $4 = 0 > InterruptHoldoffCount: $5 = 0 > CritSectionCount: $6 = 0 > > postgres 10663 1457 0 Feb14 ? 00:00:00 postgres: nbadmin > db13 111.123.123.244 idle InterruptPending: $1 = 65537 > proc_exit_inprogress: $2 = 0 > ProcDiePending: $3 = 1 > ImmediateInterruptOK: $4 = 0 > InterruptHoldoffCount: $5 = 0 > CritSectionCount: $6 = 0 > > postgres 10665 1457 0 Feb14 ? 00:00:00 postgres: nbadmin > db13 111.123.123.244 idle InterruptPending: $1 = 65537 > proc_exit_inprogress: $2 = 0 > ProcDiePending: $3 = 1 > ImmediateInterruptOK: $4 = 0 > InterruptHoldoffCount: $5 = 0 > CritSectionCount: $6 = 0 > > Thank again, > > Laurette > > On Thu, 13 Feb 2003, Tom Lane wrote: > > Laurette Cisneros <laurette@nextbus.com> writes: > > > Ok, it reproduced today on my development server. > > > > Boy, it sure looks like everything's just sitting idle waiting > > for client input, and hasn't noticed your shutdown request at > > all. You sure you used "-m fast"? Could you attach to the > > postmaster and see what its Shutdown variable contains, and to > > the backends and see what the variables manipulated by die() > > contain? (InterruptPending, proc_exit_inprogress, > > ProcDiePending, ImmediateInterruptOK, InterruptHoldoffCount, > > CritSectionCount) > > > > regards, tom lane
On Monday February 17 2003 5:09, Tom Lane wrote: > "Ed L." <pgadmin@bluepolka.net> writes: > > I've been seeing the same problem intermittently over the past 2 > > years among ~30 production clusters. > > Hm, I suddenly have a theory. Do you guys use LISTEN/NOTIFY in the > databases that are exhibiting the problem? Is it possible that the > backends that aren't cooperating have not done anything since they > last received a NOTIFY? (That would imply that the connected > frontend didn't issue any SQL commands in response to the > notification.) Not using LISTEN/NOTIFY here, at least not of which I'm aware. I'll dig a little more and let you know if I find differently. Ed
I'm compiling in the patch now and will test as soon as I can. I have to say, I have never received such fast responses and fixes with any "support" for any other product. Pg is the best! L. On Mon, 17 Feb 2003, Tom Lane wrote: > I said: > > Oh, duh! You don't need to be using NOTIFY explicitly --- you just have > > to sit idle long enough to make the SI buffer overflow, and the system > > will try to NOTIFY your backend anyway to make it read the SI message > > buffer. Which is the path I was looking at --- that will leave > > ImmediateInterruptOK false, preventing response to SIGTERM if the client > > still hasn't done anything by the time it comes. > > > Patch forthcoming. > > Attached is the fix against 7.3 (or CVS tip). The same bug exists in > 7.2 and 7.1, and should be fixable by the same patch (possibly with some > fuzz). > > You can demonstrate the problem by doing 'LISTEN foo' in one psql and > then 'NOTIFY foo' in another, then attempting a fast shutdown --- the > first psql's backend will ignore you until the psql session does something. > The same path can be followed without LISTEN if the first psql is simply > left idle while the second one does a bunch of catalog-updating work > (a few cycles of VACUUM ANALYZE usually suffice). > > Sigh, another day another bug... > > regards, tom lane > > *** src/backend/commands/async.c.orig Sun Sep 15 21:24:41 2002 > --- src/backend/commands/async.c Mon Feb 17 21:38:47 2003 > *************** > *** 599,604 **** > --- 599,614 ---- > > if (notifyInterruptEnabled) > { > + bool save_ImmediateInterruptOK = ImmediateInterruptOK; > + > + /* > + * We may be called while ImmediateInterruptOK is true; turn it off > + * while messing with the NOTIFY state. (We would have to save > + * and restore it anyway, because PGSemaphore operations inside > + * ProcessIncomingNotify() might reset it.) > + */ > + ImmediateInterruptOK = false; > + > /* > * I'm not sure whether some flavors of Unix might allow another > * SIGUSR2 occurrence to recursively interrupt this routine. To > *************** > *** 626,631 **** > --- 636,648 ---- > elog(LOG, "Async_NotifyHandler: done"); > } > } > + > + /* > + * Restore ImmediateInterruptOK, and check for interrupts if needed. > + */ > + ImmediateInterruptOK = save_ImmediateInterruptOK; > + if (save_ImmediateInterruptOK) > + CHECK_FOR_INTERRUPTS(); > } > else > { > -- Laurette Cisneros, L.D. The Database Group (510) 420-3137 NextBus Information Systems, Inc. www.nextbus.com ---------------------------------- "No man is wise enough by himself" -- Titus Maccius Plautus (254 Bc - 184 BC), Miles Gloriosus
On Monday February 17 2003 8:06, Tom Lane wrote: > > You can demonstrate the problem by doing 'LISTEN foo' in one psql > and then 'NOTIFY foo' in another, then attempting a fast shutdown > --- the first psql's backend will ignore you until the psql session > does something. The same path can be followed without LISTEN if the > first psql is simply left idle while the second one does a bunch of > catalog-updating work (a few cycles of VACUUM ANALYZE usually > suffice). Thanks, Tom. Your diagnosis sounds consistent with our experience. I was able to reproduce it with the latter approach of running a bunch of VACUUM ANALYZE commands. However, I have been unable to demonstrate this on 7.3.1 using the following sequence: session #1 launches psql session #2 launches psql session #1 does "listen foo;" session #2 does "notify foo;" session #3 does a fast shutdown; All backends immediately shutdown, no hanging. Anyway, I'm applying the patch and will advise if any further issues show up. Ed
On Monday February 17 2003 8:06, Tom Lane wrote: > > Attached is the fix against 7.3 (or CVS tip). The same bug exists > in 7.2 and 7.1, and should be fixable by the same patch (possibly > with some fuzz). > > You can demonstrate the problem by doing 'LISTEN foo' in one psql > and then 'NOTIFY foo' in another, then attempting a fast shutdown > --- the first psql's backend will ignore you until the psql session > does something. The same path can be followed without LISTEN if the > first psql is simply left idle while the second one does a bunch of > catalog-updating work (a few cycles of VACUUM ANALYZE usually > suffice). I have verified this fixed the problem on 7.2.4 in the case of the latter VACUUM ANALYZE example, and will test it also against 7.3.2 before too long. Thanks again, Tom. Ed
Ok, I let the server run for a few days (usually long enough to get into this situation) and then did a shutdown. No hang. Could be fixed. Thanks! L. On Mon, 17 Feb 2003, Tom Lane wrote: > I said: > > Oh, duh! You don't need to be using NOTIFY explicitly --- you just have > > to sit idle long enough to make the SI buffer overflow, and the system > > will try to NOTIFY your backend anyway to make it read the SI message > > buffer. Which is the path I was looking at --- that will leave > > ImmediateInterruptOK false, preventing response to SIGTERM if the client > > still hasn't done anything by the time it comes. > > > Patch forthcoming. > > Attached is the fix against 7.3 (or CVS tip). The same bug exists in > 7.2 and 7.1, and should be fixable by the same patch (possibly with some > fuzz). > > You can demonstrate the problem by doing 'LISTEN foo' in one psql and > then 'NOTIFY foo' in another, then attempting a fast shutdown --- the > first psql's backend will ignore you until the psql session does something. > The same path can be followed without LISTEN if the first psql is simply > left idle while the second one does a bunch of catalog-updating work > (a few cycles of VACUUM ANALYZE usually suffice). > > Sigh, another day another bug... > > regards, tom lane > > *** src/backend/commands/async.c.orig Sun Sep 15 21:24:41 2002 > --- src/backend/commands/async.c Mon Feb 17 21:38:47 2003 > *************** > *** 599,604 **** > --- 599,614 ---- > > if (notifyInterruptEnabled) > { > + bool save_ImmediateInterruptOK = ImmediateInterruptOK; > + > + /* > + * We may be called while ImmediateInterruptOK is true; turn it off > + * while messing with the NOTIFY state. (We would have to save > + * and restore it anyway, because PGSemaphore operations inside > + * ProcessIncomingNotify() might reset it.) > + */ > + ImmediateInterruptOK = false; > + > /* > * I'm not sure whether some flavors of Unix might allow another > * SIGUSR2 occurrence to recursively interrupt this routine. To > *************** > *** 626,631 **** > --- 636,648 ---- > elog(LOG, "Async_NotifyHandler: done"); > } > } > + > + /* > + * Restore ImmediateInterruptOK, and check for interrupts if needed. > + */ > + ImmediateInterruptOK = save_ImmediateInterruptOK; > + if (save_ImmediateInterruptOK) > + CHECK_FOR_INTERRUPTS(); > } > else > { > -- Laurette Cisneros, L.D. The Database Group (510) 420-3137 NextBus Information Systems, Inc. www.nextbus.com ---------------------------------- "No man is wise enough by himself" -- Titus Maccius Plautus (254 Bc - 184 BC), Miles Gloriosus
Ok, I let the server run for a few days (usually long enough to get into this situation) and then did a shutdown. No hang. Could be fixed. Thanks! L. On Mon, 17 Feb 2003, Tom Lane wrote: > I said: > > Oh, duh! You don't need to be using NOTIFY explicitly --- you just have > > to sit idle long enough to make the SI buffer overflow, and the system > > will try to NOTIFY your backend anyway to make it read the SI message > > buffer. Which is the path I was looking at --- that will leave > > ImmediateInterruptOK false, preventing response to SIGTERM if the client > > still hasn't done anything by the time it comes. > > > Patch forthcoming. > > Attached is the fix against 7.3 (or CVS tip). The same bug exists in > 7.2 and 7.1, and should be fixable by the same patch (possibly with some > fuzz). > > You can demonstrate the problem by doing 'LISTEN foo' in one psql and > then 'NOTIFY foo' in another, then attempting a fast shutdown --- the > first psql's backend will ignore you until the psql session does something. > The same path can be followed without LISTEN if the first psql is simply > left idle while the second one does a bunch of catalog-updating work > (a few cycles of VACUUM ANALYZE usually suffice). > > Sigh, another day another bug... > > regards, tom lane > > *** src/backend/commands/async.c.orig Sun Sep 15 21:24:41 2002 > --- src/backend/commands/async.c Mon Feb 17 21:38:47 2003 > *************** > *** 599,604 **** > --- 599,614 ---- > > if (notifyInterruptEnabled) > { > + bool save_ImmediateInterruptOK = ImmediateInterruptOK; > + > + /* > + * We may be called while ImmediateInterruptOK is true; turn it off > + * while messing with the NOTIFY state. (We would have to save > + * and restore it anyway, because PGSemaphore operations inside > + * ProcessIncomingNotify() might reset it.) > + */ > + ImmediateInterruptOK = false; > + > /* > * I'm not sure whether some flavors of Unix might allow another > * SIGUSR2 occurrence to recursively interrupt this routine. To > *************** > *** 626,631 **** > --- 636,648 ---- > elog(LOG, "Async_NotifyHandler: done"); > } > } > + > + /* > + * Restore ImmediateInterruptOK, and check for interrupts if needed. > + */ > + ImmediateInterruptOK = save_ImmediateInterruptOK; > + if (save_ImmediateInterruptOK) > + CHECK_FOR_INTERRUPTS(); > } > else > { > -- Laurette Cisneros, L.D. The Database Group (510) 420-3137 NextBus Information Systems, Inc. www.nextbus.com ---------------------------------- "No man is wise enough by himself" -- Titus Maccius Plautus (254 Bc - 184 BC), Miles Gloriosus