Re: heavily contended lwlocks with long wait queues scale badly - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: heavily contended lwlocks with long wait queues scale badly
Date
Msg-id CALj2ACWGHuuk0OmpEW8Kd93W1kYejia0PgfYj7wP700VhUqV8Q@mail.gmail.com
Whole thread Raw
In response to heavily contended lwlocks with long wait queues scale badly  (Andres Freund <andres@anarazel.de>)
Responses Re: heavily contended lwlocks with long wait queues scale badly  (Pavel Borisov <pashkin.elfe@gmail.com>)
Re: heavily contended lwlocks with long wait queues scale badly  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Thu, Oct 27, 2022 at 10:29 PM Andres Freund <andres@anarazel.de> wrote:
>
> But I think we can solve that fairly reasonably nonetheless. We can change
> PGPROC->lwWaiting to not just be a boolean, but have three states:
> 0: not waiting
> 1: waiting in waitlist
> 2: waiting to be woken up
>
> which we then can use in LWLockDequeueSelf() to only remove ourselves from the
> list if we're on it. As removal from that list is protected by the wait list
> lock, there's no race to worry about.
>
> client  patched   HEAD
> 1        60109    60174
> 2       112694   116169
> 4       214287   208119
> 8       377459   373685
> 16      524132   515247
> 32      565772   554726
> 64      587716   497508
> 128     581297   415097
> 256     550296   334923
> 512     486207   243679
> 768     449673   192959
> 1024    410836   157734
> 2048    326224    82904
> 4096    250252    32007
>
> Not perfect with the patch, but not awful either.

Here are results from my testing [1]. Results look impressive with the
patch at a higher number of clients, for instance, on HEAD TPS with
1024 clients is 103587 whereas it is 248702 with the patch.

HEAD, run 1:
1 34534
2 72088
4 135249
8 213045
16 243507
32 304108
64 375148
128 390658
256 345503
512 284510
768 146417
1024 103587
2048 34702
4096 12450

HEAD, run 2:
1 34110
2 72403
4 134421
8 211263
16 241606
32 295198
64 353580
128 385147
256 341672
512 295001
768 142341
1024 97721
2048 30229
4096 13179

PATCHED, run 1:
1 34412
2 71733
4 139141
8 211526
16 241692
32 308198
64 406198
128 385643
256 338464
512 295559
768 272639
1024 248702
2048 191402
4096 112074

PATCHED, run 2:
1 34087
2 73567
4 135624
8 211901
16 242819
32 310534
64 352663
128 381780
256 342483
512 301968
768 272596
1024 251014
2048 184939
4096 108186

> I've attached my quick-and-dirty patch. Obviously it'd need a few defines etc,
> but I wanted to get this out to discuss before spending further time.

Just for the record, here are some review comments posted in the other
thread -
https://www.postgresql.org/message-id/CALj2ACXktNbG%3DK8Xi7PSqbofTZozavhaxjatVc14iYaLu4Maag%40mail.gmail.com..

BTW, I've seen a sporadic crash (SEGV) with the patch in bg writer
with the same set up [1], I'm not sure if it's really because of the
patch. I'm unable to reproduce it now and unfortunately I didn't
capture further details when it occurred.

[1] ./configure --prefix=$PWD/inst/ --enable-tap-tests CFLAGS="-O3" >
install.log && make -j 8 install > install.log 2>&1 &
shared_buffers = 8GB
max_wal_size = 32GB
max_connections = 4096
checkpoint_timeout = 10min

ubuntu: cat << EOF >> txid.sql
SELECT txid_current();
EOF
ubuntu: for c in 1 2 4 8 16 32 64 128 256 512 768 1024 2048 4096; do
echo -n "$c ";./pgbench -n -M prepared -U ubuntu postgres -f txid.sql
-c$c -j$c -T5 2>&1|grep '^tps'|awk '{print $3}';done

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: vignesh C
Date:
Subject: Re: Support logical replication of DDLs
Next
From: Dilip Kumar
Date:
Subject: Re: Code checks for App Devs, using new options for transaction behavior