Re: Configurable FP_LOCK_SLOTS_PER_BACKEND - Mailing list pgsql-hackers

From Matt Smiley
Subject Re: Configurable FP_LOCK_SLOTS_PER_BACKEND
Date
Msg-id CA+eRB3qYKAn3SVB_1wwYNQx36hLFkm6-th=gCPxxczQXxE_B6A@mail.gmail.com
Whole thread Raw
In response to Re: Configurable FP_LOCK_SLOTS_PER_BACKEND  (Andres Freund <andres@anarazel.de>)
Responses Re: Configurable FP_LOCK_SLOTS_PER_BACKEND
List pgsql-hackers
Hi Andres, thanks for helping!  Great questions, replies are inline below.

On Sun, Aug 6, 2023 at 1:00 PM Andres Freund <andres@anarazel.de> wrote:
Hm, I'm curious whether you have a way to trigger the issue outside of your
prod environment. Mainly because I'm wondering if you're potentially hitting
the issue fixed in a4adc31f690 - we ended up not backpatching that fix, so
you'd not see the benefit unless you reproduced the load in 16+.

Thanks for sharing this!

I have not yet written a reproducer since we see this daily in production.  I have a sketch of a few ways that I think will reproduce the behavior we're observing, but haven't had time to implement it.

I'm not sure if we're seeing this behavior in production, but it's definitely an interesting find.  Currently we are running postgres 12.11, with an upcoming upgrade to 15 planned.  Good to know there's a potential improvement waiting in 16.  I noticed that in LWLockAcquire the call to LWLockDequeueSelf occurs (https://github.com/postgres/postgres/blob/REL_12_11/src/backend/storage/lmgr/lwlock.c#L1218) directly between the unsuccessful attempt to immediately acquire the lock and reporting the backend's wait event.  The distinctive indicators we have been using for this pathology are that "lock_manager" wait_event and its associated USDT probe (https://github.com/postgres/postgres/blob/REL_12_11/src/backend/storage/lmgr/lwlock.c#L1236-L1237), both of which occur after whatever overhead is incurred by LWLockDequeueSelf.  As you mentioned in your commit message, that overhead is hard to detect.  My first impression is that whatever overhead it incurs is in addition to what we are investigating.
 
I'm also wondering if it's possible that the reason for the throughput drops
are possibly correlated with heavyweight contention or higher frequency access
to the pg_locks view. Deadlock checking and the locks view acquire locks on
all lock manager partitions... So if there's a bout of real lock contention
(for longer than deadlock_timeout)...

Great questions, but we ruled that out.  The deadlock_timeout is 5 seconds, so frequently hitting that would massively violate SLO and would alert the on-call engineers.  The pg_locks view is scraped a couple times per minute for metrics collection, but the lock_manager lwlock contention can be observed thousands of times every second, typically with very short durations.  The following example (captured just now) shows the number of times per second over a 10-second window that any 1 of the 16 "lock_manager" lwlocks was contended:

msmiley@patroni-main-2004-103-db-gprd.c.gitlab-production.internal:~$ sudo ./bpftrace -e 'usdt:/usr/lib/postgresql/12/bin/postgres:lwlock__wait__start /str(arg0) == "lock_manager"/ { @[arg1] = count(); } interval:s:1 { print(@); clear(@); } interval:s:10 { exit(); }'
Attaching 5 probes...
@[0]: 12122
@[0]: 12888
@[0]: 13011
@[0]: 13348
@[0]: 11461
@[0]: 10637
@[0]: 10892
@[0]: 12334
@[0]: 11565
@[0]: 11596

Typically that contention only lasts a couple microseconds.  But the long tail can sometimes be much slower.  Details here: https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/2301#note_1365159507.

Given that most of your lock manager traffic comes from query planning - have
you evaluated using prepared statements more heavily?

Yes, there are unrelated obstacles to doing so -- that's a separate can of worms, unfortunately.  But in this pathology, even if we used prepared statements, the backend would still need to reacquire the same locks during each executing transaction.  So in terms of lock acquisition rate, whether it's via the planner or executor doing it, the same relations have to be locked.

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Configurable FP_LOCK_SLOTS_PER_BACKEND
Next
From: Andres Freund
Date:
Subject: Re: Configurable FP_LOCK_SLOTS_PER_BACKEND