Re: Database logins taking longer and longer, showing up as "authentication" in ps(1) - Mailing list pgsql-general

From Tom Lane
Subject Re: Database logins taking longer and longer, showing up as "authentication" in ps(1)
Date
Msg-id 1416265.1597851073@sss.pgh.pa.us
Whole thread Raw
In response to Re: Database logins taking longer and longer, showing up as "authentication" in ps(1)  (Adam Sjøgren <asjo@koldfront.dk>)
Responses Re: Database logins taking longer and longer, showing up as "authentication" in ps(1)  (Adam Sjøgren <asjo@koldfront.dk>)
List pgsql-general
=?utf-8?Q?Adam_Sj=C3=B8gren?= <asjo@koldfront.dk> writes:
> Tom writes:
>> * Adam Sjøgren (asjo@koldfront.dk) wrote:
>>> Sometimes new database logins slow down, from usually taking <0.05s to
>>> taking minutes. This is for psql as a normal user using Kerberos, for
>>> psql as the postgres superuser, for the web-application logging into the
>>> database, for everything.

>> Also, obtaining stack traces from a few of the stuck processes would
>> likely be quite informative.

> I will try to look at pg_stat_activity and pg_locks the next time - it
> just happened now and I focused on getting some stacktraces -
> unfortunately I do not have the -dbg package installed, so I don't know
> how useful they are.

Frequently, stack traces taken without debug symbols are pretty useless,
but you seem to have got lucky.  It seems clear that the unexpected wait
is here:

>   (gdb) bt
>   #0  0x00007f8272cf2827 in futex_abstimed_wait_cancelable (private=128, abstime=0x0, expected=0,
futex_word=0x7f7a2e9f1d38)at ../sysdeps/unix/sysv/linux/futex-internal.h:205 
>   #1  do_futex_wait (sem=sem@entry=0x7f7a2e9f1d38, abstime=0x0) at sem_waitcommon.c:111
>   #2  0x00007f8272cf28d4 in __new_sem_wait_slow (sem=0x7f7a2e9f1d38, abstime=0x0) at sem_waitcommon.c:181
>   #3  0x00007f8272cf297a in __new_sem_wait (sem=<optimized out>) at sem_wait.c:29
>   #4  0x000055f60d36c252 in PGSemaphoreLock ()
>   #5  0x000055f60d3e35cc in LWLockAcquire ()
>   #6  0x000055f60d3d14ee in ProcArrayAdd ()
>   #7  0x000055f60d3e01a3 in InitProcessPhase2 ()
>   #8  0x000055f60d51c99b in InitPostgres ()
>   #9  0x000055f60d3f2baf in PostgresMain ()
>   #10 0x000055f60d102bf8 in ?? ()
>   #11 0x000055f60d37e492 in PostmasterMain ()
>   #12 0x000055f60d103fa5 in main ()

which implies that the problem is unexpectedly high contention for the
ProcArrayLock.  It's unsurprising that incoming processes would need
that lock, since they need to add themselves to the ProcArray.  What's
not so apparent is why there is so much activity as to block them
from doing so for a significant amount of time.  We can guess that
the "worker processes" you mentioned are ganging up on that lock
because they need to compute query snapshots, but that still doesn't
seem like enough of an explanation.

Can you show us exactly what the "worker processes" are doing while
this is going on?

            regards, tom lane



pgsql-general by date:

Previous
From: Laura Smith
Date:
Subject: Re: Inline count on a query
Next
From: "David G. Johnston"
Date:
Subject: Re: Inline count on a query