Home > mailing lists

Re: Optimize LWLock scalability via ReadBiasedLWLock for heavily-shared locks - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: Optimize LWLock scalability via ReadBiasedLWLock for heavily-shared locks
Date	July 1 17:06:55
Msg-id	qishrqbxm5cy6mzlwrvrt2cbsqpktozxaghq3p5u6akqztuamy@ku4v4ppkmcrm Whole thread Raw
In response to	Re: Optimize LWLock scalability via ReadBiasedLWLock for heavily-shared locks (Andres Freund <andres@anarazel.de>)
Responses	Re: Optimize LWLock scalability via ReadBiasedLWLock for heavily-shared locks
List	pgsql-hackers

Tree view

Hi,

On 2025-07-01 09:57:18 -0400, Andres Freund wrote:
> On 2025-06-26 13:07:49 +0800, Zhou, Zhiguo wrote:
> > This patch addresses severe LWLock contention observed on high-core systems
> > where hundreds of processors concurrently access frequently-shared locks.
> > Specifically for ProcArrayLock (exhibiting 93.5% shared-mode acquires), we
> > implement a new ReadBiasedLWLock mechanism to eliminate the atomic operation
> > bottleneck.
> > 
> > Key aspects:
> > 1. Problem: Previous optimizations[1] left LWLockAttemptLock/Release
> > consuming
> >    ~25% total CPU cycles on 384-vCPU systems due to contention on a single
> >    lock-state cache line. Shared lock attempts showed 37x higher cumulative
> >    latency than exclusive mode for ProcArrayLock.
> > 
> > 2. Solution: ReadBiasedLWLock partitions lock state across 16 cache lines
> >    (READ_BIASED_LOCK_STATE_COUNT):
> >    - Readers acquire/release only their designated LWLock (indexed by
> >      pid % 16) using a single atomic operation
> >    - Writers pay higher cost by acquiring all 16 sub-locks exclusively
> >    - Maintains LWLock's "acquiring process must release" semantics
> > 
> > 3. Performance: HammerDB/TPCC shows 35.3% NOPM improvement over baseline
> >    - Lock acquisition CPU cycles reduced from 16.7% to 7.4%
> >    - Lock release cycles reduced from 7.9% to 2.2%
> > 
> > 4. Implementation:
> >    - Core infrastructure for ReadBiasedLWLock
> >    - ProcArrayLock converted as proof-of-concept
> >    - Maintains full LWLock API compatibility
> > 
> > Known considerations:
> > - Increased writer acquisition cost (acceptable given rarity of exclusive
> >   acquisitions for biased locks like ProcArrayLock)
> 
> Unfortunately I have a very hard time believing that that's unacceptable -
> there are plenty workloads (many write intensive ones) where exclusive locks
> on ProcArrayLock are the bottleneck.

Ooops, s/unacceptable/acceptable/

Greetings,

Andres Freund

pgsql-hackers by date:

From: Ashutosh Bapat
Date: 01 July, 17:05:18
Subject: Re: Report bytes and transactions actually sent downtream

From: Tom Lane
Date: 01 July, 17:24:07
Subject: Re: implicit casts from void*

Re: Optimize LWLock scalability via ReadBiasedLWLock for heavily-shared locks - Mailing list pgsql-hackers

Previous

Next