Thread: Startup death!
Samuel Liddicott http://www.ananova.com
Support Consultant
sam@ananova.com
Direct Dial: +44 (0)113 367 4523
Fax: +44 (0)113 367 4680
Switchboard: +44 (0)113 367 4600Ananova Limited
Marshall Mill
Marshall Street
Leeds
LS11 9YJ
St James Court
Great Park Road
Almondsbury Park
Bradley Stoke
Bristol BS32 4QJ
Registered in England No.2858918
The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you receive this in error, please contact the sender and delete the material from any computer.
"Sam Liddicott" <sam.liddicott@ananova.com> writes: > Why are all these processes stuck in startup and taking as much cpu as they > can? You tell us. Attach to a few of them with gdb and get stack traces. (It will help if you've built PG with --enable-debug.) regards, tom lane
Seems I had this same problem a while back with 7.2.1 We had I/O problems. Our RAID controller driver was acting up. Upgrading the i20 driver from Redhat finally and definitively solved the problem. If you check your processlist, you will see that those "startup" processes are in an Uninterruptible Sleep mode. We ended up having to hard reboot the machine to shut down Postgresql. After about a week of this we found out about the driver. I would love to hear what your solution was, but am almost sure it is related to a disk i/o issue. For others in the list... What does it mean when the Postgresql processes are in startup mode? What is it supposed to be doing in that mode? - Ericson Smith eric@did-it.com On Thu, 2002-07-18 at 09:57, Tom Lane wrote: > "Sam Liddicott" <sam.liddicott@ananova.com> writes: > > Why are all these processes stuck in startup and taking as much cpu as they > > can? > > You tell us. Attach to a few of them with gdb and get stack traces. > (It will help if you've built PG with --enable-debug.) > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster
> -----Original Message----- > From: Ericson Smith [mailto:eric@did-it.com] > Sent: 18 July 2002 15:34 > To: Tom Lane > Cc: Postgresql General Mailing List > Subject: Re: [GENERAL] Startup death! > > > Seems I had this same problem a while back with 7.2.1 > > We had I/O problems. Our RAID controller driver was acting > up. Upgrading > the i20 driver from Redhat finally and definitively solved > the problem. We're using redhat 7.3 with raid... When was this that you got the i20 driver update. Did you have to say any magic words? Is it part of any release lately? What version do you use now? For us, lsmod doesn't show any kind of i20 We have /dev/hdi20 which is owned by the dev-3.3-4 package, but it has i21, i22 etc The descriptions of all the packages installed don't mention i20 We have unused (no disks) Adaptec AIC7899 and then we actually use a MegaRAID card. > I would love to hear what your solution was, but am almost sure it is > related to a disk i/o issue. When it next happens we will strace -p and gdb the processes to see what they are doing. > For others in the list... What does it mean when the Postgresql > processes are in startup mode? What is it supposed to be doing in that > mode? yeah! Sam
We got the i20 driver update from Adaptec's site, THEN updated RedHat's kernel using their up2date utility. Here's the steps: 1. Have your SCSI Raid driver disk ready 2. You need to reinstall RedHat in expert mode so it will *not load* the default redhat driver for your RAID (this was part of the problem). 3. Insert the SCSI Raid driver when it prompts you 4. Install Linux as necessary 5. As soon as your install is finished, run rhn_register, and up2date to download the latest kernels for your machine. 6. Install and run Postgres These are the steps that we used with success. - Ericson Smith eric@did-it.com On Fri, 2002-07-19 at 03:52, Sam Liddicott wrote: > > > > -----Original Message----- > > From: Ericson Smith [mailto:eric@did-it.com] > > Sent: 18 July 2002 15:34 > > To: Tom Lane > > Cc: Postgresql General Mailing List > > Subject: Re: [GENERAL] Startup death! > > > > > > Seems I had this same problem a while back with 7.2.1 > > > > We had I/O problems. Our RAID controller driver was acting > > up. Upgrading > > the i20 driver from Redhat finally and definitively solved > > the problem. > > We're using redhat 7.3 with raid... > When was this that you got the i20 driver update. Did you have to say any > magic words? Is it part of any release lately? What version do you use > now? > For us, lsmod doesn't show any kind of i20 > We have /dev/hdi20 which is owned by the dev-3.3-4 package, but it has i21, > i22 etc > The descriptions of all the packages installed don't mention i20 > > We have unused (no disks) Adaptec AIC7899 and then we actually use a > MegaRAID card. > > > I would love to hear what your solution was, but am almost sure it is > > related to a disk i/o issue. > > When it next happens we will strace -p and gdb the processes to see what > they are doing. > > > For others in the list... What does it mean when the Postgresql > > processes are in startup mode? What is it supposed to be doing in that > > mode? > > yeah! > > Sam > > >
Thanks you very much, good advice here! We will try this, and may bug your personally (?) if we need clarification as it doesn't seem to be a postgres issue. Sam > -----Original Message----- > From: Ericson Smith [mailto:eric@did-it.com] > Sent: 19 July 2002 14:03 > To: Sam Liddicott > Cc: pgsql-general@postgresql.org > Subject: RE: [GENERAL] Startup death! > > > We got the i20 driver update from Adaptec's site, THEN > updated RedHat's > kernel using their up2date utility. > > Here's the steps: > > 1. Have your SCSI Raid driver disk ready > 2. You need to reinstall RedHat in expert mode so it will > *not load* the > default redhat driver for your RAID (this was part of the problem). > 3. Insert the SCSI Raid driver when it prompts you > 4. Install Linux as necessary > 5. As soon as your install is finished, run rhn_register, and > up2date to > download the latest kernels for your machine. > 6. Install and run Postgres > > These are the steps that we used with success. > > - Ericson Smith > eric@did-it.com > > > On Fri, 2002-07-19 at 03:52, Sam Liddicott wrote: > > > > > > > -----Original Message----- > > > From: Ericson Smith [mailto:eric@did-it.com] > > > Sent: 18 July 2002 15:34 > > > To: Tom Lane > > > Cc: Postgresql General Mailing List > > > Subject: Re: [GENERAL] Startup death! > > > > > > > > > Seems I had this same problem a while back with 7.2.1 > > > > > > We had I/O problems. Our RAID controller driver was acting > > > up. Upgrading > > > the i20 driver from Redhat finally and definitively solved > > > the problem. > > > > We're using redhat 7.3 with raid... > > When was this that you got the i20 driver update. Did you > have to say any > > magic words? Is it part of any release lately? What > version do you use > > now? > > For us, lsmod doesn't show any kind of i20 > > We have /dev/hdi20 which is owned by the dev-3.3-4 package, > but it has i21, > > i22 etc > > The descriptions of all the packages installed don't mention i20 > > > > We have unused (no disks) Adaptec AIC7899 and then we actually use a > > MegaRAID card. > > > > > I would love to hear what your solution was, but am > almost sure it is > > > related to a disk i/o issue. > > > > When it next happens we will strace -p and gdb the > processes to see what > > they are doing. > > > > > For others in the list... What does it mean when the Postgresql > > > processes are in startup mode? What is it supposed to be > doing in that > > > mode? > > > > yeah! > > > > Sam > > > > > > > >
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: 18 July 2002 14:57 > To: Sam Liddicott > Cc: pgsql-general@postgresql.org > Subject: Re: [GENERAL] Startup death! > > > "Sam Liddicott" <sam.liddicott@ananova.com> writes: > > Why are all these processes stuck in startup and taking as > much cpu as they > > can? > > You tell us. Attach to a few of them with gdb and get stack traces. > (It will help if you've built PG with --enable-debug.) Here's the output of the script I wrote for support to run when the error occurs. It has the ps list, a gdb and an strace for 10 seconds of a stuck process. I'll fix the gdb hangup error so maybe we get the full stack trace (it worked in testing!) UID PID PPID C STIME TTY TIME CMD postgres 23609 1 0 Aug08 ? 00:00:58 /usr/bin/postmaster postgres 23611 23609 0 Aug08 ? 00:01:20 postgres: stats buffer process postgres 23612 23611 5 Aug08 ? 01:21:06 postgres: stats collector process postgres 31028 23609 8 04:31 ? 00:01:34 postgres: tv tv [local] startup postgres 31239 23609 9 04:32 ? 00:01:39 postgres: tv tv 10.30.10.105 startup postgres 31429 23609 4 04:36 ? 00:00:35 postgres: tv tv 10.30.10.101 startup postgres 31458 23609 4 04:36 ? 00:00:36 postgres: tv tv 10.30.10.105 startup postgres 31484 23609 4 04:36 ? 00:00:37 postgres: tv tv 10.30.10.105 startup postgres 31495 23609 5 04:37 ? 00:00:37 postgres: tv tv 10.30.10.104 startup postgres 31704 23609 4 04:37 ? 00:00:29 postgres: tv tv 10.30.10.104 startup postgres 31719 23609 4 04:38 ? 00:00:29 postgres: tv tv 10.30.10.102 startup postgres 31738 23609 4 04:38 ? 00:00:27 postgres: tv tv 10.30.10.102 startup postgres 31761 23609 2 04:38 ? 00:00:17 postgres: tv tv 10.30.10.103 startup postgres 31766 23609 2 04:38 ? 00:00:13 postgres: tv tv 10.30.10.103 startup postgres 31791 23609 3 04:39 ? 00:00:24 postgres: tv tv 10.30.10.102 startup postgres 31799 23609 3 04:39 ? 00:00:22 postgres: tv tv 10.30.10.102 startup postgres 31820 23609 3 04:39 ? 00:00:21 postgres: tv tv 10.30.10.104 startup postgres 31841 23609 0 04:40 ? 00:00:01 postgres: tv tv 10.30.10.101 startup postgres 31842 23609 0 04:40 ? 00:00:01 postgres: tv tv 10.30.10.105 startup postgres 31846 23609 0 04:40 ? 00:00:02 postgres: tv tv 10.30.10.105 startup postgres 31857 23609 0 04:40 ? 00:00:03 postgres: tv tv 10.30.10.102 startup postgres 31889 23609 0 04:40 ? 00:00:03 postgres: tv tv 10.30.10.104 startup postgres 31899 23609 0 04:41 ? 00:00:03 postgres: tv tv 10.30.10.102 startup postgres 31910 23609 0 04:41 ? 00:00:02 postgres: tv tv 10.30.10.105 startup postgres 31911 23609 0 04:41 ? 00:00:02 postgres: tv tv 10.30.10.103 startup postgres 31943 23609 0 04:41 ? 00:00:04 postgres: tv tv 10.30.10.101 startup postgres 31947 23609 0 04:42 ? 00:00:03 postgres: tv tv 10.30.10.102 startup postgres 32046 23609 0 04:42 ? 00:00:04 postgres: tv tv 10.30.10.101 startup postgres 32136 23609 0 04:42 ? 00:00:02 postgres: tv tv 10.30.10.105 startup postgres 32140 23609 0 04:42 ? 00:00:02 postgres: tv tv 10.30.10.105 startup postgres 32141 23609 0 04:42 ? 00:00:02 postgres: tv tv 10.30.10.102 startup postgres 32163 23609 1 04:42 ? 00:00:04 postgres: tv tv 10.30.10.102 startup postgres 32174 23609 0 04:43 ? 00:00:01 postgres: tv tv 10.30.10.102 startup postgres 32175 23609 0 04:43 ? 00:00:01 postgres: tv tv 10.30.10.103 startup postgres 32176 23609 0 04:43 ? 00:00:01 postgres: tv tv 10.30.10.101 startup postgres 32180 23609 0 04:43 ? 00:00:01 postgres: tv tv 10.30.10.101 startup postgres 32188 23609 0 04:43 ? 00:00:01 postgres: tv tv 10.30.10.101 startup postgres 32189 23609 0 04:43 ? 00:00:00 postgres: tv tv 10.30.10.104 startup postgres 32190 23609 0 04:43 ? 00:00:00 postgres: tv tv 10.30.10.104 startup postgres 32200 23609 0 04:43 ? 00:00:01 postgres: tv tv 10.30.10.105 startup postgres 32204 23609 0 04:43 ? 00:00:02 postgres: tv tv 10.30.10.105 startup postgres 32236 23609 0 04:43 ? 00:00:02 postgres: tv tv 10.30.10.101 startup postgres 32238 23609 0 04:44 ? 00:00:01 postgres: tv tv 10.30.10.103 startup postgres 32242 23609 0 04:44 ? 00:00:01 postgres: tv tv 10.30.10.101 startup postgres 32243 23609 0 04:44 ? 00:00:01 postgres: tv tv 10.30.10.104 startup postgres 32249 23609 0 04:44 ? 00:00:01 postgres: tv tv 10.30.10.101 startup postgres 32254 23609 0 04:44 ? 00:00:01 postgres: tv tv 10.30.10.103 startup postgres 32258 23609 0 04:44 ? 00:00:01 postgres: tv tv 10.30.10.105 startup postgres 32262 23609 0 04:44 ? 00:00:02 postgres: tv tv 10.30.10.101 startup postgres 32272 23609 0 04:44 ? 00:00:01 postgres: tv tv 10.30.10.101 startup postgres 32296 23609 0 04:45 ? 00:00:02 postgres: tv tv 10.30.10.102 startup postgres 32344 23609 1 04:45 ? 00:00:03 postgres: tv tv 10.30.10.105 startup postgres 32361 23609 0 04:45 ? 00:00:01 postgres: tv tv 10.30.10.104 startup postgres 32365 23609 0 04:45 ? 00:00:02 postgres: tv tv 10.30.10.103 startup postgres 32395 23609 1 04:45 ? 00:00:02 postgres: tv tv 10.30.10.101 startup postgres 32410 23609 1 04:46 ? 00:00:02 postgres: tv tv 10.30.10.102 startup postgres 32431 23609 2 04:46 ? 00:00:03 postgres: tv tv 10.30.10.104 startup postgres 32435 23609 0 04:46 ? 00:00:01 postgres: tv tv 10.30.10.103 startup postgres 32436 23609 0 04:46 ? 00:00:01 postgres: tv tv 10.30.10.105 startup postgres 32462 23609 1 04:47 ? 00:00:01 postgres: tv tv 10.30.10.101 startup postgres 32619 23609 2 04:47 ? 00:00:03 postgres: tv tv 10.30.10.103 startup postgres 32680 23609 1 04:47 ? 00:00:01 postgres: tv tv 10.30.10.101 startup postgres 32712 23609 1 04:48 ? 00:00:01 postgres: tv tv 10.30.10.104 startup postgres 32721 23609 1 04:48 ? 00:00:00 postgres: tv tv 10.30.10.102 startup postgres 32732 23609 0 04:48 ? 00:00:00 postgres: tv tv 10.30.10.105 startup postgres 32765 23609 1 04:49 ? 00:00:00 postgres: tv tv 10.30.10.101 startup postgres 304 23609 0 04:49 ? 00:00:00 postgres: tv tv 10.30.10.105 startup postgres 318 23609 0 04:49 ? 00:00:00 postgres: tv tv 10.30.10.101 startup postgres 324 23609 0 04:49 ? 00:00:00 postgres: tv tv 10.30.10.104 startup ------ doing 31028 31028 =============================== GNU gdb Red Hat Linux (5.1.90CVS-5) Copyright 2002 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux". Attaching to process 31028 Reading symbols from /usr/bin/postgres...done. Reading symbols from /lib/libpam.so.0...done. Loaded symbols for /lib/libpam.so.0 Reading symbols from /lib/libssl.so.2...done. Loaded symbols for /lib/libssl.so.2 Reading symbols from /lib/libcrypto.so.2...done. Loaded symbols for /lib/libcrypto.so.2 Reading symbols from /usr/kerberos/lib/libkrb5.so.3...done. Loaded symbols for /usr/kerberos/lib/libkrb5.so.3 Reading symbols from /usr/kerberos/lib/libk5crypto.so.3...done. Loaded symbols for /usr/kerberos/lib/libk5crypto.so.3 Reading symbols from /usr/kerberos/lib/libcom_err.so.3...done. Loaded symbols for /usr/kerberos/lib/libcom_err.so.3 Reading symbols from /usr/lib/libz.so.1...done. Loaded symbols for /usr/lib/libz.so.1 Reading symbols from /lib/libcrypt.so.1...done. Loaded symbols for /lib/libcrypt.so.1 Reading symbols from /lib/libresolv.so.2...done. Loaded symbols for /lib/libresolv.so.2 Reading symbols from /lib/libnsl.so.1...done. Loaded symbols for /lib/libnsl.so.1 Reading symbols from /lib/libdl.so.2...done. Loaded symbols for /lib/libdl.so.2 Reading symbols from /lib/i686/libm.so.6...done. Loaded symbols for /lib/i686/libm.so.6 Reading symbols from /usr/lib/libreadline.so.4...done. Loaded symbols for /usr/lib/libreadline.so.4 Reading symbols from /lib/libtermcap.so.2...done. Loaded symbols for /lib/libtermcap.so.2 Reading symbols from /lib/i686/libc.so.6...done. Loaded symbols for /lib/i686/libc.so.6 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 Reading symbols from /lib/libnss_files.so.2...done. Loaded symbols for /lib/libnss_files.so.2 Reading symbols from /usr/lib/gconv/ISO8859-1.so...done. Loaded symbols for /usr/lib/gconv/ISO8859-1.so 0x420e8b52 in semop () from /lib/i686/libc.so.6 (gdb) Hangup detected on fd 0 error detected on stdin Detaching from program: /usr/bin/postgres, process 31028 semop(3538944, 0xbfffe150, 1) = 0 semop(3538944, 0xbfffe150, 1) = 0 semop(3538944, 0xbfffe150, 1) = 0 semop(3538944, 0xbfffe150, 1) = 0 semop(3538944, 0xbfffe150, 1) = 0 semop(3538944, 0xbfffe150, 1) = 0 semop(3538944, 0xbfffe150, 1) = 0 semop(3538944, 0xbfffe1c0, 1) = 0 semop(3571713, 0xbfffe170, 1) = 0 semop(3571713, 0xbfffe170, 1) = 0 semop(3538944, 0xbfffe1c0, 1) = 0 semop(3571713, 0xbfffe170, 1) = 0 semop(3538944, 0xbfffe1c0, 1) = 0 semop(3571713, 0xbfffe170, 1) = 0 semop(3538944, 0xbfffe150, 1) = 0 semop(3538944, 0xbfffe150, 1) = 0 semop(3538944, 0xbfffe150, 1) = 0 semop(3571713, 0xbfffe170, 1) = 0 semop(3538944, 0xbfffe150, 1) = 0 semop(3538944, 0xbfffe150, 1) = 0 semop(3538944, 0xbfffe150, 1) = 0 semop(3538944, 0xbfffe150, 1) = 0 semop(3538944, 0xbfffe150, 1) = 0 semop(3538944, 0xbfffe150, 1) = 0 semop(3538944, 0xbfffe150, 1) = 0 semop(3538944, 0xbfffe150, 1) = 0 semop(3538944, 0xbfffe1c0, 1) = 0 semop(3571713, 0xbfffe150, 1) = 0 semop(3538944, 0xbfffe150, 1) = 0 semop(3571713, 0xbfffe170, 1) = 0 semop(3538944, 0xbfffe1c0, 1) = 0 semop(3571713, 0xbfffe170, 1) = 0 semop(3538944, 0xbfffe1c0, 1) = 0 semop(3538944, 0xbfffe1c0, 1) = 0 semop(3538944, 0xbfffe170, 1) = 0 semop(3538944, 0xbfffe150, 1) = 0 semop(3538944, 0xbfffe1c0, 1) = 0 semop(3604482, 0xbfffe150, 1) = 0 semop(3538944, 0xbfffe150, 1) = 0 semop(3538944, 0xbfffe1c0, 1) = 0 semop(3571713, 0xbfffe150, 1) = 0 semop(3538944, 0xbfffe150, 1) = 0 semop(3538944, 0xbfffe150, 1------------------------------