Home > mailing lists

Re: Lockless queue of waiters in LWLock - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: Lockless queue of waiters in LWLock
Date	November 4, 2022 19:07:41
Msg-id	20221104190741.gsuiybes3hula74m@awork3.anarazel.de Whole thread Raw
In response to	Re: Lockless queue of waiters in LWLock (Pavel Borisov <pashkin.elfe@gmail.com>)
Responses	Re: Lockless queue of waiters in LWLock
List	pgsql-hackers

Tree view

Hi,

On 2022-11-03 14:50:11 +0400, Pavel Borisov wrote:
> Or maybe there is another explanation for now small performance
> difference around 20 connections described in [0]?
> Thoughts?

Using xadd is quite a bit cheaper than cmpxchg, and now every lock release
uses a compare-exchange, I think.

In the past I had a more complicated version of LWLockAcquire which tried to
use an xadd to acquire locks. IIRC (and this is long enough ago that I might
not) that proved to be a benefit, but I was worried about the complexity. And
just getting in the version that didn't always use a spinlock was the higher
priority.

The use of cmpxchg vs lock inc/lock add/xadd is one of the major reasons why
lwlocks are slower than a spinlock (but obviously are better under contention
nonetheless).

I have a benchmark program that starts a thread for each physical core and
just increments a counter on an atomic value.

On my dual Xeon Gold 5215 workstation:

cmpxchg:
32: throughput per thread: 0.55M/s, total: 11.02M/s
64: throughput per thread: 0.63M/s, total: 12.68M/s

lock add:
32: throughput per thread: 2.10M/s, total: 41.98M/s
64: throughput per thread: 2.12M/s, total: 42.40M/s

xadd:
32: throughput per thread: 2.10M/s, total: 41.91M/s
64: throughput per thread: 2.04M/s, total: 40.71M/s

and even when there's no contention, every thread just updating its own
cacheline:

cmpxchg:
32: throughput per thread: 88.83M/s, total: 1776.51M/s
64: throughput per thread: 96.46M/s, total: 1929.11M/s

lock add:
32: throughput per thread: 166.07M/s, total: 3321.31M/s
64: throughput per thread: 165.86M/s, total: 3317.22M/s

add (no lock):
32: throughput per thread: 530.78M/s, total: 10615.62M/s
64: throughput per thread: 531.22M/s, total: 10624.35M/s

xadd:
32: throughput per thread: 165.88M/s, total: 3317.51M/s
64: throughput per thread: 165.93M/s, total: 3318.53M/s

Greetings,

Andres Freund

pgsql-hackers by date:

From: Justin Pryzby
Date: 04 November 2022, 18:38:38
Subject: Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

From: Nikolay Shaplov
Date: 04 November 2022, 19:30:06
Subject: Re: [PATCH] New [relation] option engine

Re: Lockless queue of waiters in LWLock - Mailing list pgsql-hackers

Previous

Next