Re: backends stuck in "startup" - Mailing list pgsql-general

From Tom Lane
Subject Re: backends stuck in "startup"
Date
Msg-id 14525.1511397830@sss.pgh.pa.us
Whole thread Raw
In response to Re: backends stuck in "startup"  (Justin Pryzby <pryzby@telsasoft.com>)
Responses Re: backends stuck in "startup"
List pgsql-general
Justin Pryzby <pryzby@telsasoft.com> writes:
> For starters, I found that PID 27427 has:

> (gdb) p proc->lwWaiting
> $1 = 0 '\000'
> (gdb) p proc->lwWaitMode
> $2 = 1 '\001'

To confirm, this is LWLockAcquire's "proc", equal to MyProc?
If so, and if LWLockAcquire is blocked at PGSemaphoreLock,
that sure seems like a smoking gun.

> Note: I've compiled locally PG 10.1 with PREFERRED_SEMAPHORES=SYSV to keep the
> service up (and to the degree that serves to verify that avoids the issue,
> great).

Good idea, I was going to suggest that.  It will be very interesting
to see if that makes the problem go away.

> Would you suggest how I can maximize the likelyhood/speed of triggering that ?
> Five years ago, with a report of similar symptoms, you said "You need to hack
> pgbench to suppress the single initialization connection it normally likes to
> make, else the test degenerates to the one-incoming-connection case"
> https://www.postgresql.org/message-id/8896.1337998337%40sss.pgh.pa.us

I don't think that case was related at all.

My theory suggests that any contended use of an LWLock is at risk,
in which case just running pgbench with about as many sessions as
you have in the live server ought to be able to trigger it.  However,
that doesn't really account for your having observed the problem
only during session startup, so there may be some other factor
involved.  I wonder if it only happens during the first wait for
an LWLock ... and if so, how could that be?
        regards, tom lane


pgsql-general by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: query causes connection termination
Next
From: Tom Lane
Date:
Subject: Re: query causes connection termination