Thread: PostgreSQL hanging on new connections?
hi, got (another time, it's not the first thing) this situation: 1. server: 64gb of ram, 24 cores of xeon processors (hyperthreading is off), storage is fusion io. 2. at one point load skyrocketed to 48 (i know, it's not high with this hardware) 3. we were not able to connect to PostgreSQL from psql - psql when run just froze. 4. postgresql: PostgreSQL 8.3.12 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-48) What I did gather: straced postmaster when the problem was happening, and I was opening new connections. strace looks like this: 16:54:36.886720 select(6, [3 4 5], NULL, NULL, {54, 366000}) = 1 (in [4], left {51, 563000}) 16:54:39.768318 rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP ABRT BUS FPE SEGV CONT SYS RTMIN RT_1], NULL, 8) = 0 16:54:39.770795 accept(4, {sa_family=AF_INET, sin_port=htons(59661), sin_addr=inet_addr("10.50.50.7")}, [16]) = 8 16:54:39.773895 getsockname(8, {sa_family=AF_INET, sin_port=htons(5432), sin_addr=inet_addr("10.60.0.10")}, [16]) = 0 16:54:39.774522 setsockopt(8, SOL_TCP, TCP_NODELAY, [1], 4) = 0 16:54:39.774579 setsockopt(8, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0 16:54:39.774729 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2b06874a4460)= 12891 16:54:39.854208 close(8) = 0 16:54:39.855800 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 16:54:39.856853 select(6, [3 4 5], NULL, NULL, {60, 0}) = 1 (in [4], left {57, 842000}) 16:54:42.018570 rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP ABRT BUS FPE SEGV CONT SYS RTMIN RT_1], NULL, 8) = 0 16:54:42.027079 accept(4, {sa_family=AF_INET, sin_port=htons(59662), sin_addr=inet_addr("10.50.50.7")}, [16]) = 8 16:54:42.046511 getsockname(8, {sa_family=AF_INET, sin_port=htons(5432), sin_addr=inet_addr("10.60.0.10")}, [16]) = 0 16:54:42.095917 setsockopt(8, SOL_TCP, TCP_NODELAY, [1], 4) = 0 16:54:42.109190 setsockopt(8, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0 16:54:42.111603 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2b06874a4460)= 12912 16:54:42.114379 close(8) = 0 16:54:42.120551 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 16:54:42.128922 select(6, [3 4 5], NULL, NULL, {60, 0}) = 1 (in [4], left {58, 593000}) so, everything looks right. but - I was also stracing backends - just after creation (I did strace -ff -p <postmaster.pid> -o x.log). strace from single backend, just forked off: 16:54:39.852638 close(3) = 0 16:54:39.852823 close(4) = 0 16:54:39.852872 close(5) = 0 16:54:39.852910 close(6) = 0 16:54:39.852994 setsid() = 12891 16:54:39.853050 rt_sigaction(SIGTERM, {0x5ae490, [], SA_RESTORER|SA_RESTART, 0x3bd66302d0}, {0x588580, [], SA_RESTORER|SA_RESTART,0x3bd66302d0}, 8) = 0 16:54:39.853169 rt_sigaction(SIGQUIT, {0x5ae490, [], SA_RESTORER|SA_RESTART, 0x3bd66302d0}, {0x588580, [], SA_RESTORER|SA_RESTART,0x3bd66302d0}, 8) = 0 16:54:39.853215 rt_sigaction(SIGALRM, {0x5ae490, [], SA_RESTORER, 0x3bd66302d0}, {0x1, [], SA_RESTORER, 0x3bd66302d0}, 8)= 0 16:54:39.853259 rt_sigprocmask(SIG_SETMASK, ~[QUIT ILL TRAP ABRT BUS FPE SEGV ALRM TERM CONT SYS RTMIN RT_1], NULL, 8) =0 16:54:39.853504 write(2, "\0\0^\0[2\0\0t2010-11-29 16:54:39 UTC [12891]: [1-1] LOG: connection received: host=10.50.50.7port=59661\n", 103) = 103 16:54:39.853622 setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={60, 0}}, NULL) = 0 16:54:39.853689 recvfrom(8, "\0\0\0\10\4\322\26/", 8192, 0, NULL, NULL) = 8 16:54:39.853822 sendto(8, "N", 1, 0, NULL, 0) = 1 16:54:39.853901 recvfrom(8, "\0\0\0%\0\3\0\0user\0web\0database\0xxxxxxxxx\0\0", 8192, 0, NULL, NULL) = 37 16:54:39.856007 setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0 16:54:39.856059 rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP ABRT BUS FPE SEGV CONT SYS RTMIN RT_1], NULL, 8) = 0 16:54:39.856130 write(2, "\0\0r\0[2\0\0t2010-11-29 16:54:39 UTC 10.50.50.7(59661) [12891]: [2-1] LOG: connection authorized:user=web database=xxxxxxxxx\n", 123) = 123 16:54:39.856255 rt_sigaction(SIGHUP, {0x5ad340, [], SA_RESTORER|SA_RESTART, 0x3bd66302d0}, {0x588a90, [], SA_RESTORER|SA_RESTART,0x3bd66302d0}, 8) = 0 16:54:39.856315 rt_sigaction(SIGINT, {0x5ae350, [], SA_RESTORER|SA_RESTART, 0x3bd66302d0}, {0x588580, [], SA_RESTORER|SA_RESTART,0x3bd66302d0}, 8) = 0 16:54:39.856359 rt_sigaction(SIGTERM, {0x5ae3f0, [], SA_RESTORER|SA_RESTART, 0x3bd66302d0}, {0x5ae490, [], SA_RESTORER|SA_RESTART,0x3bd66302d0}, 8) = 0 16:54:39.856402 rt_sigaction(SIGQUIT, {0x5ae4a0, [], SA_RESTORER|SA_RESTART, 0x3bd66302d0}, {0x5ae490, [], SA_RESTORER|SA_RESTART,0x3bd66302d0}, 8) = 0 16:54:39.856445 rt_sigaction(SIGALRM, {0x5a6160, [], SA_RESTORER, 0x3bd66302d0}, {0x5ae490, [], SA_RESTORER, 0x3bd66302d0},8) = 0 16:54:39.856484 rt_sigaction(SIGPIPE, {0x1, [], SA_RESTORER|SA_RESTART, 0x3bd66302d0}, {0x1, [], SA_RESTORER|SA_RESTART,0x3bd66302d0}, 8) = 0 16:54:39.856526 rt_sigaction(SIGUSR1, {0x5a1740, [], SA_RESTORER|SA_RESTART, 0x3bd66302d0}, {0x5888b0, [], SA_RESTORER|SA_RESTART,0x3bd66302d0}, 8) = 0 16:54:39.856565 rt_sigaction(SIGUSR2, {0x4d7030, [], SA_RESTORER|SA_RESTART, 0x3bd66302d0}, {0x584a50, [], SA_RESTORER|SA_RESTART,0x3bd66302d0}, 8) = 0 16:54:39.856819 rt_sigaction(SIGFPE, {0x5ad7b0, [], SA_RESTORER|SA_RESTART, 0x3bd66302d0}, {SIG_DFL, [], 0}, 8) = 0 16:54:39.856929 rt_sigaction(SIGCHLD, {SIG_DFL, [], SA_RESTORER|SA_RESTART|SA_NOCLDSTOP, 0x3bd66302d0}, {0x588050, [], SA_RESTORER|SA_RESTART|SA_NOCLDSTOP,0x3bd66302d0}, 8) = 0 16:54:39.856978 rt_sigprocmask(SIG_SETMASK, ~[QUIT ILL TRAP ABRT BUS FPE SEGV CONT SYS RTMIN RT_1], NULL, 8) = 0 16:54:39.857043 mmap(NULL, 2101248, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b0796de9000 16:54:39.944561 semctl(7602213, 5, SETVAL, 0) = 0 16:54:39.944644 brk(0x3fbe000) = 0x3fbe000 16:54:39.944694 open("global/pg_database", O_RDONLY) = 3 16:54:39.944758 fstat(3, {st_mode=S_IFREG|0600, st_size=270, ...}) = 0 16:54:39.944811 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b0796fea000 16:54:39.944843 read(3, "\"template1\" 1 1663 456490577\n\"template0\" 11510 1663 527228657\n\"postgres\" 11511 1663 506289897\n\"xxxxxxxxx\"16387 1663 457831935\n\"yyyy\" 31851 1663 427191013\n\"aaaa\" 31853 1663 427191021\n\"bbbb\" 318541663 427191048\n\"cccc\" 31855 1663 427191069\n\"ddddd\" 1477300 1"..., 4096) = 270 16:54:39.944921 close(3) = 0 16:54:39.944954 munmap(0x2b0796fea000, 4096) = 0 16:54:39.945088 semop(7602213, 0x7fffceae7110, 1 and that's all - backend was not showing anything else. All other backends that were created at this time were doing the same thing - waiting on this semop. What's more - when I tried to stop pg (pg_ctl -m fast), all backends died except for one. This single backend was connected from very remote location, with not really good network connectivity between it and postgresqlserver. I did strace this single backend, but then I had to pg_ctl -m immediate. strace showed: 23540 sendto(8, "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 838,0, NULL, 0) = ? ERESTARTSYS (To be restarted) 23540 --- SIGQUIT (Quit) @ 0 (0) --- 23540 rt_sigprocmask(SIG_SETMASK, ~[QUIT ILL TRAP ABRT BUS FPE SEGV CONT SYS RTMIN RT_1], NULL, 8) = 0 23540 write(2, "\0\0\34\2\364[\0\0t2010-11-29 16:58:22 UTC AA.BB.CC.DD(60835) [23540]: [623-1] WARNING: terminating connectionbecause of crash of another server process\n2010-11-29 16:58:22 UTC AA.BB.CC.DD(60835) [23540]: [624-1] DETAIL: The postmaster has commanded this server p"..., 549) = 549 23540 write(2, "\0\0\204\2\364[\0\0t2010-11-29 16:58:22 UTC AA.BB.CC.DD(60835) [23540]: [626-1] FATAL: terminating connectiondue to administrator command\n2010-11-29 16:58:22 UTC AA.BB.CC.DD(60835) [23540]: [627-1] STATEMENT: SELECT table.id,table.some_f1, table.some_f2, "..., 653) = 653 23540 sendto(7, "\1\0\0\0X\2\0\0\3@\0\0\6\0\0\0\1\0\0\0\1\0\0\0p.-\0\0\0\0\0\1\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0\212I\0\0\377\177\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0>\200\0\0\377\177\0\0\6\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 600,0, NULL, 0) = 600 23540 --- SIGQUIT (Quit) @ 0 (0) --- 23540 rt_sigprocmask(SIG_SETMASK, ~[QUIT ILL TRAP ABRT BUS FPE SEGV CONT SYS RTMIN RT_1], NULL, 8) = 0 23540 write(2, "\0\0\34\2\364[\0\0t2010-11-29 16:58:22 UTC AA.BB.CC.DD(60835) [23540]: [628-1] WARNING: terminating connectionbecause of crash of another server process\n2010-11-29 16:58:22 UTC AA.BB.CC.DD(60835) [23540]: [629-1] DETAIL: The postmaster has commanded this server p"..., 549) = 549 23540 exit_group(2) = ? Now, the question is: why did it hang? Is there anything we can do to make it *not* hang? Best regards, depesz -- Linkedin: http://www.linkedin.com/in/depesz / blog: http://www.depesz.com/ jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007
hubert depesz lubaczewski <depesz@depesz.com> writes: > straced postmaster when the problem was happening, and I was opening new > connections. strace looks like this: > [ backend hangs on semop immediately after reading global/pg_database ] It looks like something had exclusive lock on the database that new connections wanted to connect to. AFAICS the only action in 8.3 that would do that would be a DROP DATABASE or RENAME DATABASE. What was that other session doing? regards, tom lane
On Mon, Nov 29, 2010 at 01:23:00PM -0500, Tom Lane wrote: > hubert depesz lubaczewski <depesz@depesz.com> writes: > > straced postmaster when the problem was happening, and I was opening new > > connections. strace looks like this: > > [ backend hangs on semop immediately after reading global/pg_database ] > > It looks like something had exclusive lock on the database that new > connections wanted to connect to. AFAICS the only action in 8.3 that > would do that would be a DROP DATABASE or RENAME DATABASE. What was > that other session doing? just some select. as far as I know there are no backends with such commands - drop/rename db. depesz -- Linkedin: http://www.linkedin.com/in/depesz / blog: http://www.depesz.com/ jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007
On Mon, Nov 29, 2010 at 01:23:00PM -0500, Tom Lane wrote: > hubert depesz lubaczewski <depesz@depesz.com> writes: > > straced postmaster when the problem was happening, and I was opening new > > connections. strace looks like this: > > [ backend hangs on semop immediately after reading global/pg_database ] > > It looks like something had exclusive lock on the database that new > connections wanted to connect to. AFAICS the only action in 8.3 that > would do that would be a DROP DATABASE or RENAME DATABASE. What was > that other session doing? verified in logs. there were no alter database or drop database statements. Best regards, depesz -- Linkedin: http://www.linkedin.com/in/depesz / blog: http://www.depesz.com/ jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007
On Mon, Nov 29, 2010 at 1:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > hubert depesz lubaczewski <depesz@depesz.com> writes: >> straced postmaster when the problem was happening, and I was opening new >> connections. strace looks like this: >> [ backend hangs on semop immediately after reading global/pg_database ] > > It looks like something had exclusive lock on the database that new > connections wanted to connect to. AFAICS the only action in 8.3 that > would do that would be a DROP DATABASE or RENAME DATABASE. What was > that other session doing? Every once in a while when I connect to my big DB it feels like it stalls the connection for a few seconds. I have no idea what causes it, but it does feel like it has to do with the load. The load is not all that high relative to what my box can handle. It doesn't happen often enough for me to track it down, though. It instinct is that it is waiting on a lock, but clearly there is no rename/drop happening on my main db else i'd be out of business :-) I too run 8.3 as primary right now... testing 9.0 for deployment soon-ish.
On Mon, 2010-11-29 at 15:57 -0500, Vick Khera wrote: > On Mon, Nov 29, 2010 at 1:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > hubert depesz lubaczewski <depesz@depesz.com> writes: > >> straced postmaster when the problem was happening, and I was opening new > >> connections. strace looks like this: > >> [ backend hangs on semop immediately after reading global/pg_database ] > > > > It looks like something had exclusive lock on the database that new > > connections wanted to connect to. AFAICS the only action in 8.3 that > > would do that would be a DROP DATABASE or RENAME DATABASE. What was > > that other session doing? > > Every once in a while when I connect to my big DB it feels like it > stalls the connection for a few seconds. I have no idea what causes > it, but it does feel like it has to do with the load. The load is not > all that high relative to what my box can handle. It doesn't happen > often enough for me to track it down, though. It instinct is that it > is waiting on a lock, but clearly there is no rename/drop happening on > my main db else i'd be out of business :-) I too run 8.3 as primary > right now... testing 9.0 for deployment soon-ish. Initiating the connection all the caches etc? Perhaps trying to allocate shared memory? JD -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
On Nov 29, 2010, at 12:57 PM, Vick Khera wrote: > On Mon, Nov 29, 2010 at 1:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> hubert depesz lubaczewski <depesz@depesz.com> writes: >>> straced postmaster when the problem was happening, and I was opening new >>> connections. strace looks like this: >>> [ backend hangs on semop immediately after reading global/pg_database ] >> >> It looks like something had exclusive lock on the database that new >> connections wanted to connect to. AFAICS the only action in 8.3 that >> would do that would be a DROP DATABASE or RENAME DATABASE. What was >> that other session doing? > > Every once in a while when I connect to my big DB it feels like it > stalls the connection for a few seconds. I have no idea what causes > it, but it does feel like it has to do with the load. The load is not > all that high relative to what my box can handle. It doesn't happen > often enough for me to track it down, though. It instinct is that it > is waiting on a lock, but clearly there is no rename/drop happening on > my main db else i'd be out of business :-) I too run 8.3 as primary > right now... testing 9.0 for deployment soon-ish. We're having similar issues on 8.4.[245]... occasionally psql takes anywhere from a few to several dozen seconds to connect.I've been unsuccessfully trying to blame spikes in the OS run queue (we desperately need some connection pooling)but if it's something to do with locks I can't see in pg_locks, that would explain why I haven't been able to figureout what's going on yet....
On Mon, Nov 29, 2010 at 10:03 PM, Ben Chobot <bench@silentmedia.com> wrote:
On Nov 29, 2010, at 12:57 PM, Vick Khera wrote:We're having similar issues on 8.4.[245]... occasionally psql takes anywhere from a few to several dozen seconds to connect. I've been unsuccessfully trying to blame spikes in the OS run queue (we desperately need some connection pooling) but if it's something to do with locks I can't see in pg_locks, that would explain why I haven't been able to figure out what's going on yet....
> On Mon, Nov 29, 2010 at 1:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> hubert depesz lubaczewski <depesz@depesz.com> writes:
>>> straced postmaster when the problem was happening, and I was opening new
>>> connections. strace looks like this:
>>> [ backend hangs on semop immediately after reading global/pg_database ]
>>
>> It looks like something had exclusive lock on the database that new
>> connections wanted to connect to. AFAICS the only action in 8.3 that
>> would do that would be a DROP DATABASE or RENAME DATABASE. What was
>> that other session doing?
>
> Every once in a while when I connect to my big DB it feels like it
> stalls the connection for a few seconds. I have no idea what causes
> it, but it does feel like it has to do with the load. The load is not
> all that high relative to what my box can handle. It doesn't happen
> often enough for me to track it down, though. It instinct is that it
> is waiting on a lock, but clearly there is no rename/drop happening on
> my main db else i'd be out of business :-) I too run 8.3 as primary
> right now... testing 9.0 for deployment soon-ish.--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
This is normal. PostgreSQL needs to create new server process to handle your requested connection.
Then it needs to allocate resources to that new connection. It initializes shared memory for that connection. That is the stall you are mentioning.
If you are time sensitive on connection times, then use connection pooling software like pg-bouncer or pg-pool2. Pooling will keep your server instance "hot" and those lags won't happen.
On 11/30/2010 03:28 PM, Dusan Misic wrote: > We're having similar issues on 8.4.[245]... occasionally psql takes > anywhere from a few to several dozen seconds to connect. I've been > unsuccessfully trying to blame spikes in the OS run queue (we > desperately need some connection pooling) but if it's something to > do with locks I can't see in pg_locks, that would explain why I > haven't been able to figure out what's going on yet.... > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org > <mailto:pgsql-general@postgresql.org>) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > > > This is normal. PostgreSQL needs to create new server process to handle > your requested connection. > > Then it needs to allocate resources to that new connection. It > initializes shared memory for that connection. That is the stall you are > mentioning. Eh, what? Forking a backend and attaching to shared memory should *not* take "a few seconds". On my test machine it takes 100ms to fork psql, connect to the postmaster, fork a backend, init the backend, authenticate, run a dummy query and exit psql. If you're seeing delays like that, your machine is horrifyingly overloaded or there's something else wrong. -- Craig Ringer
On Tue, Nov 30, 2010 at 10:20 AM, Craig Ringer <craig@postnewspapers.com.au> wrote:
On 11/30/2010 03:28 PM, Dusan Misic wrote:We're having similar issues on 8.4.[245]... occasionally psql takes
anywhere from a few to several dozen seconds to connect. I've been
unsuccessfully trying to blame spikes in the OS run queue (we
desperately need some connection pooling) but if it's something to
do with locks I can't see in pg_locks, that would explain why I
haven't been able to figure out what's going on yet....
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org
<mailto:pgsql-general@postgresql.org>)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
This is normal. PostgreSQL needs to create new server process to handle
your requested connection.
Then it needs to allocate resources to that new connection. It
initializes shared memory for that connection. That is the stall you are
mentioning.
Eh, what?
Forking a backend and attaching to shared memory should *not* take "a few seconds". On my test machine it takes 100ms to fork psql, connect to the postmaster, fork a backend, init the backend, authenticate, run a dummy query and exit psql.
If you're seeing delays like that, your machine is horrifyingly overloaded or there's something else wrong.
--
Craig Ringer
This is not happening to me. Even on my home computer (which is terribly slow single core Sempron 2600+ with 128 kB of L2 cache, 1.5 GB DDR RAM and with slow ATA hard disks) it is fast to connect. Maximum (slowest) connection time is about 500 ms. I did some tuning in postgresql.conf.
I'm very happy with PostgreSQL speed on my 'snail' computer. :)
* hubert depesz lubaczewski: > Now, the question is: why did it hang? Is there anything we can do to > make it *not* hang? It might be some general system overload issue. Try running "echo w > /proc/sysrq-trigger" as root the next time it happens. This will dump kernel backtraces to dmesg, which might hint to what's going on with the system. -- Florian Weimer <fweimer@bfk.de> BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99
On Mon, Nov 29, 2010 at 03:57:29PM -0500, Vick Khera wrote: > On Mon, Nov 29, 2010 at 1:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > hubert depesz lubaczewski <depesz@depesz.com> writes: > >> straced postmaster when the problem was happening, and I was opening new > >> connections. strace looks like this: > >> [ backend hangs on semop immediately after reading global/pg_database ] > > > > It looks like something had exclusive lock on the database that new > > connections wanted to connect to. AFAICS the only action in 8.3 that > > would do that would be a DROP DATABASE or RENAME DATABASE. What was > > that other session doing? > > Every once in a while when I connect to my big DB it feels like it > stalls the connection for a few seconds. I have no idea what causes > it, but it does feel like it has to do with the load. The load is not > all that high relative to what my box can handle. It doesn't happen > often enough for me to track it down, though. It instinct is that it > is waiting on a lock, but clearly there is no rename/drop happening on > my main db else i'd be out of business :-) I too run 8.3 as primary > right now... testing 9.0 for deployment soon-ish. jkust so it will be clear - on some other occasion when we had the problem, i left it hng for a while. 5 minutes later i still didn't get the connection. Best regards, depesz -- Linkedin: http://www.linkedin.com/in/depesz / blog: http://www.depesz.com/ jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007
On Tue, Nov 30, 2010 at 10:12:47AM +0000, Florian Weimer wrote: > * hubert depesz lubaczewski: > > > Now, the question is: why did it hang? Is there anything we can do to > > make it *not* hang? > > It might be some general system overload issue. Try running "echo w > > /proc/sysrq-trigger" as root the next time it happens. This will dump > kernel backtraces to dmesg, which might hint to what's going on with > the system. will check the idea, but the thing is that aside from postgresql - everything else on the server is working fine - ssh/top/*stat. and other postgreses seem to be locked too. Best regards, depesz -- Linkedin: http://www.linkedin.com/in/depesz / blog: http://www.depesz.com/ jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007