Re: Configurable FP_LOCK_SLOTS_PER_BACKEND - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Configurable FP_LOCK_SLOTS_PER_BACKEND |
Date | |
Msg-id | 20230807211625.llkgxonwarffewfq@awork3.anarazel.de Whole thread Raw |
In response to | Re: Configurable FP_LOCK_SLOTS_PER_BACKEND (Matt Smiley <msmiley@gitlab.com>) |
List | pgsql-hackers |
Hi, On 2023-08-07 13:59:26 -0700, Matt Smiley wrote: > I have not yet written a reproducer since we see this daily in production. > I have a sketch of a few ways that I think will reproduce the behavior > we're observing, but haven't had time to implement it. > > I'm not sure if we're seeing this behavior in production It might be worth for you to backpatch commit 92daeca45df Author: Andres Freund <andres@anarazel.de> Date: 2022-11-21 20:34:17 -0800 Add wait event for pg_usleep() in perform_spin_delay() into 12. That should be low risk and have only trivially resolvable conflicts. Alternatively, you could use bpftrace et al to set a userspace probe on perform_spin_delay(). > , but it's definitely an interesting find. Currently we are running > postgres 12.11, with an upcoming upgrade to 15 planned. Good to know > there's a potential improvement waiting in 16. I noticed that in > LWLockAcquire the call to LWLockDequeueSelf occurs ( > https://github.com/postgres/postgres/blob/REL_12_11/src/backend/storage/lmgr/lwlock.c#L1218) > directly between the unsuccessful attempt to immediately acquire the lock > and reporting the backend's wait event. That's normal. > > I'm also wondering if it's possible that the reason for the throughput > > drops > > are possibly correlated with heavyweight contention or higher frequency > > access > > to the pg_locks view. Deadlock checking and the locks view acquire locks on > > all lock manager partitions... So if there's a bout of real lock contention > > (for longer than deadlock_timeout)... > > > > Great questions, but we ruled that out. The deadlock_timeout is 5 seconds, > so frequently hitting that would massively violate SLO and would alert the > on-call engineers. The pg_locks view is scraped a couple times per minute > for metrics collection, but the lock_manager lwlock contention can be > observed thousands of times every second, typically with very short > durations. The following example (captured just now) shows the number of > times per second over a 10-second window that any 1 of the 16 > "lock_manager" lwlocks was contended: Some short-lived contention is fine and expected - the question is how long the waits are... Unfortunately my experience is that the overhead of bpftrace means that analyzing things like this with bpftrace is very hard... :(. > > Given that most of your lock manager traffic comes from query planning - > > have you evaluated using prepared statements more heavily? > > > > Yes, there are unrelated obstacles to doing so -- that's a separate can of > worms, unfortunately. But in this pathology, even if we used prepared > statements, the backend would still need to reacquire the same locks during > each executing transaction. So in terms of lock acquisition rate, whether > it's via the planner or executor doing it, the same relations have to be > locked. Planning will often lock more database objects than query execution. Which can keep you using fastpath locks for longer. Greetings, Andres Freund
pgsql-hackers by date: