Thread: PostgreSQL hanging on new connections?

PostgreSQL hanging on new connections?

From
hubert depesz lubaczewski
Date:
hi,
got (another time, it's not the first thing) this situation:
1. server: 64gb of ram, 24 cores of xeon processors (hyperthreading is off), storage is fusion io.
2. at one point load skyrocketed to 48 (i know, it's not high with this
hardware)
3. we were not able to connect to PostgreSQL from psql - psql when run
just froze.
4. postgresql: PostgreSQL 8.3.12 on x86_64-redhat-linux-gnu, compiled by
GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-48)

What I did gather:
straced postmaster when the problem was happening, and I was opening new
connections. strace looks like this:

16:54:36.886720 select(6, [3 4 5], NULL, NULL, {54, 366000}) = 1 (in [4], left {51, 563000})
16:54:39.768318 rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP ABRT BUS FPE SEGV CONT SYS RTMIN RT_1], NULL, 8) = 0
16:54:39.770795 accept(4, {sa_family=AF_INET, sin_port=htons(59661), sin_addr=inet_addr("10.50.50.7")}, [16]) = 8
16:54:39.773895 getsockname(8, {sa_family=AF_INET, sin_port=htons(5432), sin_addr=inet_addr("10.60.0.10")}, [16]) = 0
16:54:39.774522 setsockopt(8, SOL_TCP, TCP_NODELAY, [1], 4) = 0
16:54:39.774579 setsockopt(8, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
16:54:39.774729 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x2b06874a4460)= 12891 
16:54:39.854208 close(8)                = 0
16:54:39.855800 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
16:54:39.856853 select(6, [3 4 5], NULL, NULL, {60, 0}) = 1 (in [4], left {57, 842000})
16:54:42.018570 rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP ABRT BUS FPE SEGV CONT SYS RTMIN RT_1], NULL, 8) = 0
16:54:42.027079 accept(4, {sa_family=AF_INET, sin_port=htons(59662), sin_addr=inet_addr("10.50.50.7")}, [16]) = 8
16:54:42.046511 getsockname(8, {sa_family=AF_INET, sin_port=htons(5432), sin_addr=inet_addr("10.60.0.10")}, [16]) = 0
16:54:42.095917 setsockopt(8, SOL_TCP, TCP_NODELAY, [1], 4) = 0
16:54:42.109190 setsockopt(8, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
16:54:42.111603 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x2b06874a4460)= 12912 
16:54:42.114379 close(8) = 0
16:54:42.120551 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
16:54:42.128922 select(6, [3 4 5], NULL, NULL, {60, 0}) = 1 (in [4], left {58, 593000})

so, everything looks right.

but - I was also stracing backends - just after creation (I did strace -ff -p <postmaster.pid> -o x.log).

strace from single backend, just forked off:

16:54:39.852638 close(3)                = 0
16:54:39.852823 close(4)                = 0
16:54:39.852872 close(5)                = 0
16:54:39.852910 close(6)                = 0
16:54:39.852994 setsid()                = 12891
16:54:39.853050 rt_sigaction(SIGTERM, {0x5ae490, [], SA_RESTORER|SA_RESTART, 0x3bd66302d0}, {0x588580, [],
SA_RESTORER|SA_RESTART,0x3bd66302d0}, 8) = 0 
16:54:39.853169 rt_sigaction(SIGQUIT, {0x5ae490, [], SA_RESTORER|SA_RESTART, 0x3bd66302d0}, {0x588580, [],
SA_RESTORER|SA_RESTART,0x3bd66302d0}, 8) = 0 
16:54:39.853215 rt_sigaction(SIGALRM, {0x5ae490, [], SA_RESTORER, 0x3bd66302d0}, {0x1, [], SA_RESTORER, 0x3bd66302d0},
8)= 0 
16:54:39.853259 rt_sigprocmask(SIG_SETMASK, ~[QUIT ILL TRAP ABRT BUS FPE SEGV ALRM TERM CONT SYS RTMIN RT_1], NULL, 8)
=0 
16:54:39.853504 write(2, "\0\0^\0[2\0\0t2010-11-29 16:54:39 UTC  [12891]: [1-1] LOG:  connection received:
host=10.50.50.7port=59661\n", 103) = 103 
16:54:39.853622 setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={60, 0}}, NULL) = 0
16:54:39.853689 recvfrom(8, "\0\0\0\10\4\322\26/", 8192, 0, NULL, NULL) = 8
16:54:39.853822 sendto(8, "N", 1, 0, NULL, 0) = 1
16:54:39.853901 recvfrom(8, "\0\0\0%\0\3\0\0user\0web\0database\0xxxxxxxxx\0\0", 8192, 0, NULL, NULL) = 37
16:54:39.856007 setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0
16:54:39.856059 rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP ABRT BUS FPE SEGV CONT SYS RTMIN RT_1], NULL, 8) = 0
16:54:39.856130 write(2, "\0\0r\0[2\0\0t2010-11-29 16:54:39 UTC 10.50.50.7(59661) [12891]: [2-1] LOG:  connection
authorized:user=web database=xxxxxxxxx\n", 123) = 123 
16:54:39.856255 rt_sigaction(SIGHUP, {0x5ad340, [], SA_RESTORER|SA_RESTART, 0x3bd66302d0}, {0x588a90, [],
SA_RESTORER|SA_RESTART,0x3bd66302d0}, 8) = 0 
16:54:39.856315 rt_sigaction(SIGINT, {0x5ae350, [], SA_RESTORER|SA_RESTART, 0x3bd66302d0}, {0x588580, [],
SA_RESTORER|SA_RESTART,0x3bd66302d0}, 8) = 0 
16:54:39.856359 rt_sigaction(SIGTERM, {0x5ae3f0, [], SA_RESTORER|SA_RESTART, 0x3bd66302d0}, {0x5ae490, [],
SA_RESTORER|SA_RESTART,0x3bd66302d0}, 8) = 0 
16:54:39.856402 rt_sigaction(SIGQUIT, {0x5ae4a0, [], SA_RESTORER|SA_RESTART, 0x3bd66302d0}, {0x5ae490, [],
SA_RESTORER|SA_RESTART,0x3bd66302d0}, 8) = 0 
16:54:39.856445 rt_sigaction(SIGALRM, {0x5a6160, [], SA_RESTORER, 0x3bd66302d0}, {0x5ae490, [], SA_RESTORER,
0x3bd66302d0},8) = 0 
16:54:39.856484 rt_sigaction(SIGPIPE, {0x1, [], SA_RESTORER|SA_RESTART, 0x3bd66302d0}, {0x1, [],
SA_RESTORER|SA_RESTART,0x3bd66302d0}, 8) = 0 
16:54:39.856526 rt_sigaction(SIGUSR1, {0x5a1740, [], SA_RESTORER|SA_RESTART, 0x3bd66302d0}, {0x5888b0, [],
SA_RESTORER|SA_RESTART,0x3bd66302d0}, 8) = 0 
16:54:39.856565 rt_sigaction(SIGUSR2, {0x4d7030, [], SA_RESTORER|SA_RESTART, 0x3bd66302d0}, {0x584a50, [],
SA_RESTORER|SA_RESTART,0x3bd66302d0}, 8) = 0 
16:54:39.856819 rt_sigaction(SIGFPE, {0x5ad7b0, [], SA_RESTORER|SA_RESTART, 0x3bd66302d0}, {SIG_DFL, [], 0}, 8) = 0
16:54:39.856929 rt_sigaction(SIGCHLD, {SIG_DFL, [], SA_RESTORER|SA_RESTART|SA_NOCLDSTOP, 0x3bd66302d0}, {0x588050, [],
SA_RESTORER|SA_RESTART|SA_NOCLDSTOP,0x3bd66302d0}, 8) = 0 
16:54:39.856978 rt_sigprocmask(SIG_SETMASK, ~[QUIT ILL TRAP ABRT BUS FPE SEGV CONT SYS RTMIN RT_1], NULL, 8) = 0
16:54:39.857043 mmap(NULL, 2101248, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b0796de9000
16:54:39.944561 semctl(7602213, 5, SETVAL, 0) = 0
16:54:39.944644 brk(0x3fbe000)          = 0x3fbe000
16:54:39.944694 open("global/pg_database", O_RDONLY) = 3
16:54:39.944758 fstat(3, {st_mode=S_IFREG|0600, st_size=270, ...}) = 0
16:54:39.944811 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b0796fea000
16:54:39.944843 read(3, "\"template1\" 1 1663 456490577\n\"template0\" 11510 1663 527228657\n\"postgres\" 11511 1663
506289897\n\"xxxxxxxxx\"16387 1663 457831935\n\"yyyy\" 31851 1663 427191013\n\"aaaa\" 31853 1663 427191021\n\"bbbb\"
318541663 427191048\n\"cccc\" 31855 1663 427191069\n\"ddddd\" 1477300 1"..., 4096) = 270 
16:54:39.944921 close(3)                = 0
16:54:39.944954 munmap(0x2b0796fea000, 4096) = 0
16:54:39.945088 semop(7602213, 0x7fffceae7110, 1

and that's all - backend was not showing anything else.

All other backends that were created at this time were doing the same thing - waiting on this semop.

What's more - when I tried to stop pg (pg_ctl -m fast), all backends died except for one.

This single backend was connected from very remote location, with not really good network connectivity between it and
postgresqlserver. 

I did strace this single backend, but then I had to pg_ctl -m immediate.

strace showed:

23540 sendto(8,
"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"...,
838,0, NULL, 0) = ? ERESTARTSYS (To be restarted) 
23540 --- SIGQUIT (Quit) @ 0 (0) ---
23540 rt_sigprocmask(SIG_SETMASK, ~[QUIT ILL TRAP ABRT BUS FPE SEGV CONT SYS RTMIN RT_1], NULL, 8) = 0
23540 write(2, "\0\0\34\2\364[\0\0t2010-11-29 16:58:22 UTC AA.BB.CC.DD(60835) [23540]: [623-1] WARNING:  terminating
connectionbecause of crash of another server process\n2010-11-29 16:58:22 UTC AA.BB.CC.DD(60835) [23540]: [624-1]
DETAIL: The postmaster has commanded this server p"..., 549) = 549 
23540 write(2, "\0\0\204\2\364[\0\0t2010-11-29 16:58:22 UTC AA.BB.CC.DD(60835) [23540]: [626-1] FATAL:  terminating
connectiondue to administrator command\n2010-11-29 16:58:22 UTC AA.BB.CC.DD(60835) [23540]: [627-1] STATEMENT:  SELECT
table.id,table.some_f1, table.some_f2, "..., 653) = 653 
23540 sendto(7,
"\1\0\0\0X\2\0\0\3@\0\0\6\0\0\0\1\0\0\0\1\0\0\0p.-\0\0\0\0\0\1\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0\212I\0\0\377\177\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0>\200\0\0\377\177\0\0\6\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
600,0, NULL, 0) = 600 
23540 --- SIGQUIT (Quit) @ 0 (0) ---
23540 rt_sigprocmask(SIG_SETMASK, ~[QUIT ILL TRAP ABRT BUS FPE SEGV CONT SYS RTMIN RT_1], NULL, 8) = 0
23540 write(2, "\0\0\34\2\364[\0\0t2010-11-29 16:58:22 UTC AA.BB.CC.DD(60835) [23540]: [628-1] WARNING:  terminating
connectionbecause of crash of another server process\n2010-11-29 16:58:22 UTC AA.BB.CC.DD(60835) [23540]: [629-1]
DETAIL: The postmaster has commanded this server p"..., 549) = 549 
23540 exit_group(2)                     = ?

Now, the question is: why did it hang? Is there anything we can do to
make it *not* hang?

Best regards,

depesz

--
Linkedin: http://www.linkedin.com/in/depesz  /  blog: http://www.depesz.com/
jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007

Re: PostgreSQL hanging on new connections?

From
Tom Lane
Date:
hubert depesz lubaczewski <depesz@depesz.com> writes:
> straced postmaster when the problem was happening, and I was opening new
> connections. strace looks like this:
> [ backend hangs on semop immediately after reading global/pg_database ]

It looks like something had exclusive lock on the database that new
connections wanted to connect to.  AFAICS the only action in 8.3 that
would do that would be a DROP DATABASE or RENAME DATABASE.  What was
that other session doing?

            regards, tom lane

Re: PostgreSQL hanging on new connections?

From
hubert depesz lubaczewski
Date:
On Mon, Nov 29, 2010 at 01:23:00PM -0500, Tom Lane wrote:
> hubert depesz lubaczewski <depesz@depesz.com> writes:
> > straced postmaster when the problem was happening, and I was opening new
> > connections. strace looks like this:
> > [ backend hangs on semop immediately after reading global/pg_database ]
>
> It looks like something had exclusive lock on the database that new
> connections wanted to connect to.  AFAICS the only action in 8.3 that
> would do that would be a DROP DATABASE or RENAME DATABASE.  What was
> that other session doing?

just some select.

as far as I know there are no backends with such commands - drop/rename
db.

depesz

--
Linkedin: http://www.linkedin.com/in/depesz  /  blog: http://www.depesz.com/
jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007

Re: PostgreSQL hanging on new connections?

From
hubert depesz lubaczewski
Date:
On Mon, Nov 29, 2010 at 01:23:00PM -0500, Tom Lane wrote:
> hubert depesz lubaczewski <depesz@depesz.com> writes:
> > straced postmaster when the problem was happening, and I was opening new
> > connections. strace looks like this:
> > [ backend hangs on semop immediately after reading global/pg_database ]
>
> It looks like something had exclusive lock on the database that new
> connections wanted to connect to.  AFAICS the only action in 8.3 that
> would do that would be a DROP DATABASE or RENAME DATABASE.  What was
> that other session doing?

verified in logs. there were no alter database or drop database
statements.

Best regards,

depesz

--
Linkedin: http://www.linkedin.com/in/depesz  /  blog: http://www.depesz.com/
jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007

Re: PostgreSQL hanging on new connections?

From
Vick Khera
Date:
On Mon, Nov 29, 2010 at 1:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> hubert depesz lubaczewski <depesz@depesz.com> writes:
>> straced postmaster when the problem was happening, and I was opening new
>> connections. strace looks like this:
>> [ backend hangs on semop immediately after reading global/pg_database ]
>
> It looks like something had exclusive lock on the database that new
> connections wanted to connect to.  AFAICS the only action in 8.3 that
> would do that would be a DROP DATABASE or RENAME DATABASE.  What was
> that other session doing?

Every once in a while when I connect to my big DB it feels like it
stalls the connection for a few seconds.  I have no idea what causes
it, but it does feel like it has to do with the load.  The load is not
all that high relative to what my box can handle.  It doesn't happen
often enough for me to track it down, though.  It instinct is that it
is waiting on a lock, but clearly there is no rename/drop happening on
my main db else i'd be out of business :-)  I too run 8.3 as primary
right now... testing 9.0 for deployment soon-ish.

Re: PostgreSQL hanging on new connections?

From
"Joshua D. Drake"
Date:
On Mon, 2010-11-29 at 15:57 -0500, Vick Khera wrote:
> On Mon, Nov 29, 2010 at 1:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > hubert depesz lubaczewski <depesz@depesz.com> writes:
> >> straced postmaster when the problem was happening, and I was opening new
> >> connections. strace looks like this:
> >> [ backend hangs on semop immediately after reading global/pg_database ]
> >
> > It looks like something had exclusive lock on the database that new
> > connections wanted to connect to.  AFAICS the only action in 8.3 that
> > would do that would be a DROP DATABASE or RENAME DATABASE.  What was
> > that other session doing?
>
> Every once in a while when I connect to my big DB it feels like it
> stalls the connection for a few seconds.  I have no idea what causes
> it, but it does feel like it has to do with the load.  The load is not
> all that high relative to what my box can handle.  It doesn't happen
> often enough for me to track it down, though.  It instinct is that it
> is waiting on a lock, but clearly there is no rename/drop happening on
> my main db else i'd be out of business :-)  I too run 8.3 as primary
> right now... testing 9.0 for deployment soon-ish.

Initiating the connection all the caches etc? Perhaps trying to allocate
shared memory?

JD

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt


Re: PostgreSQL hanging on new connections?

From
Ben Chobot
Date:
On Nov 29, 2010, at 12:57 PM, Vick Khera wrote:

> On Mon, Nov 29, 2010 at 1:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> hubert depesz lubaczewski <depesz@depesz.com> writes:
>>> straced postmaster when the problem was happening, and I was opening new
>>> connections. strace looks like this:
>>> [ backend hangs on semop immediately after reading global/pg_database ]
>>
>> It looks like something had exclusive lock on the database that new
>> connections wanted to connect to.  AFAICS the only action in 8.3 that
>> would do that would be a DROP DATABASE or RENAME DATABASE.  What was
>> that other session doing?
>
> Every once in a while when I connect to my big DB it feels like it
> stalls the connection for a few seconds.  I have no idea what causes
> it, but it does feel like it has to do with the load.  The load is not
> all that high relative to what my box can handle.  It doesn't happen
> often enough for me to track it down, though.  It instinct is that it
> is waiting on a lock, but clearly there is no rename/drop happening on
> my main db else i'd be out of business :-)  I too run 8.3 as primary
> right now... testing 9.0 for deployment soon-ish.

We're having similar issues on 8.4.[245]... occasionally psql takes anywhere from a few to several dozen seconds to
connect.I've been unsuccessfully trying to blame spikes in the OS run queue (we desperately need some connection
pooling)but if it's something to do with locks I can't see in pg_locks, that would explain why I haven't been able to
figureout what's going on yet.... 

Re: PostgreSQL hanging on new connections?

From
Dusan Misic
Date:


On Mon, Nov 29, 2010 at 10:03 PM, Ben Chobot <bench@silentmedia.com> wrote:
On Nov 29, 2010, at 12:57 PM, Vick Khera wrote:

> On Mon, Nov 29, 2010 at 1:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> hubert depesz lubaczewski <depesz@depesz.com> writes:
>>> straced postmaster when the problem was happening, and I was opening new
>>> connections. strace looks like this:
>>> [ backend hangs on semop immediately after reading global/pg_database ]
>>
>> It looks like something had exclusive lock on the database that new
>> connections wanted to connect to.  AFAICS the only action in 8.3 that
>> would do that would be a DROP DATABASE or RENAME DATABASE.  What was
>> that other session doing?
>
> Every once in a while when I connect to my big DB it feels like it
> stalls the connection for a few seconds.  I have no idea what causes
> it, but it does feel like it has to do with the load.  The load is not
> all that high relative to what my box can handle.  It doesn't happen
> often enough for me to track it down, though.  It instinct is that it
> is waiting on a lock, but clearly there is no rename/drop happening on
> my main db else i'd be out of business :-)  I too run 8.3 as primary
> right now... testing 9.0 for deployment soon-ish.

We're having similar issues on 8.4.[245]... occasionally psql takes anywhere from a few to several dozen seconds to connect. I've been unsuccessfully trying to blame spikes in the OS run queue (we desperately need some connection pooling) but if it's something to do with locks I can't see in pg_locks, that would explain why I haven't been able to figure out what's going on yet....
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

This is normal. PostgreSQL needs to create new server process to handle your requested connection. 

Then it needs to allocate resources to that new connection. It initializes shared memory for that connection. That is the stall you are mentioning. 

If you are time sensitive on connection times, then use connection pooling software like pg-bouncer or pg-pool2. Pooling will keep your server instance "hot" and those lags won't happen.

Re: PostgreSQL hanging on new connections?

From
Craig Ringer
Date:
On 11/30/2010 03:28 PM, Dusan Misic wrote:
>     We're having similar issues on 8.4.[245]... occasionally psql takes
>     anywhere from a few to several dozen seconds to connect. I've been
>     unsuccessfully trying to blame spikes in the OS run queue (we
>     desperately need some connection pooling) but if it's something to
>     do with locks I can't see in pg_locks, that would explain why I
>     haven't been able to figure out what's going on yet....
>     --
>     Sent via pgsql-general mailing list (pgsql-general@postgresql.org
>     <mailto:pgsql-general@postgresql.org>)
>     To make changes to your subscription:
>     http://www.postgresql.org/mailpref/pgsql-general
>
>
> This is normal. PostgreSQL needs to create new server process to handle
> your requested connection.
>
> Then it needs to allocate resources to that new connection. It
> initializes shared memory for that connection. That is the stall you are
> mentioning.

Eh, what?

Forking a backend and attaching to shared memory should *not* take "a
few seconds". On my test machine it takes 100ms to fork psql, connect to
the postmaster, fork a backend, init the backend, authenticate, run a
dummy query and exit psql.

If you're seeing delays like that, your machine is horrifyingly
overloaded or there's something else wrong.

--
Craig Ringer

Re: PostgreSQL hanging on new connections?

From
Dusan Misic
Date:

On Tue, Nov 30, 2010 at 10:20 AM, Craig Ringer <craig@postnewspapers.com.au> wrote:
On 11/30/2010 03:28 PM, Dusan Misic wrote:
   We're having similar issues on 8.4.[245]... occasionally psql takes
   anywhere from a few to several dozen seconds to connect. I've been
   unsuccessfully trying to blame spikes in the OS run queue (we
   desperately need some connection pooling) but if it's something to
   do with locks I can't see in pg_locks, that would explain why I
   haven't been able to figure out what's going on yet....
   --
   Sent via pgsql-general mailing list (pgsql-general@postgresql.org
   <mailto:pgsql-general@postgresql.org>)
   To make changes to your subscription:
   http://www.postgresql.org/mailpref/pgsql-general


This is normal. PostgreSQL needs to create new server process to handle
your requested connection.

Then it needs to allocate resources to that new connection. It
initializes shared memory for that connection. That is the stall you are
mentioning.

Eh, what?

Forking a backend and attaching to shared memory should *not* take "a few seconds". On my test machine it takes 100ms to fork psql, connect to the postmaster, fork a backend, init the backend, authenticate, run a dummy query and exit psql.

If you're seeing delays like that, your machine is horrifyingly overloaded or there's something else wrong.

--
Craig Ringer

This is not happening to me. Even on my home computer (which is terribly slow single core Sempron 2600+ with 128 kB of L2 cache, 1.5 GB DDR RAM and with slow ATA hard disks) it is fast to connect. Maximum (slowest) connection time is about 500 ms. I did some tuning in postgresql.conf. 

I'm very happy with PostgreSQL speed on my 'snail' computer. :) 

Re: PostgreSQL hanging on new connections?

From
Florian Weimer
Date:
* hubert depesz lubaczewski:

> Now, the question is: why did it hang? Is there anything we can do to
> make it *not* hang?

It might be some general system overload issue.  Try running "echo w >
/proc/sysrq-trigger" as root the next time it happens.  This will dump
kernel backtraces to dmesg, which might hint to what's going on with
the system.

--
Florian Weimer                <fweimer@bfk.de>
BFK edv-consulting GmbH       http://www.bfk.de/
Kriegsstraße 100              tel: +49-721-96201-1
D-76133 Karlsruhe             fax: +49-721-96201-99

Re: PostgreSQL hanging on new connections?

From
hubert depesz lubaczewski
Date:
On Mon, Nov 29, 2010 at 03:57:29PM -0500, Vick Khera wrote:
> On Mon, Nov 29, 2010 at 1:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > hubert depesz lubaczewski <depesz@depesz.com> writes:
> >> straced postmaster when the problem was happening, and I was opening new
> >> connections. strace looks like this:
> >> [ backend hangs on semop immediately after reading global/pg_database ]
> >
> > It looks like something had exclusive lock on the database that new
> > connections wanted to connect to.  AFAICS the only action in 8.3 that
> > would do that would be a DROP DATABASE or RENAME DATABASE.  What was
> > that other session doing?
>
> Every once in a while when I connect to my big DB it feels like it
> stalls the connection for a few seconds.  I have no idea what causes
> it, but it does feel like it has to do with the load.  The load is not
> all that high relative to what my box can handle.  It doesn't happen
> often enough for me to track it down, though.  It instinct is that it
> is waiting on a lock, but clearly there is no rename/drop happening on
> my main db else i'd be out of business :-)  I too run 8.3 as primary
> right now... testing 9.0 for deployment soon-ish.

 jkust so it will be clear - on some other occasion when we had the
 problem, i left it hng for a while. 5 minutes later i still didn't get
 the connection.

 Best regards,

 depesz


--
Linkedin: http://www.linkedin.com/in/depesz  /  blog: http://www.depesz.com/
jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007

Re: PostgreSQL hanging on new connections?

From
hubert depesz lubaczewski
Date:
On Tue, Nov 30, 2010 at 10:12:47AM +0000, Florian Weimer wrote:
> * hubert depesz lubaczewski:
>
> > Now, the question is: why did it hang? Is there anything we can do to
> > make it *not* hang?
>
> It might be some general system overload issue.  Try running "echo w >
> /proc/sysrq-trigger" as root the next time it happens.  This will dump
> kernel backtraces to dmesg, which might hint to what's going on with
> the system.

will check the idea, but the thing is that aside from postgresql -
everything else on the server is working fine - ssh/top/*stat.
and other postgreses seem to be locked too.

Best regards,

depesz

--
Linkedin: http://www.linkedin.com/in/depesz  /  blog: http://www.depesz.com/
jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007