Thread: heavily contended lwlocks with long wait queues scale badly

heavily contended lwlocks with long wait queues scale badly

From

Andres Freund

Date:

27 October 2022, 16:59:14

Hi,

I am working on posting a patch series making relation extension more
scalable. As part of that I was running some benchmarks for workloads that I
thought should not or just positively impacted - but I was wrong, there was
some very significant degradation at very high client counts. After pulling my
hair out for quite a while to try to understand that behaviour, I figured out
that it's just a side-effect of *removing* some other contention. This
morning, turns out sleeping helps, I managed to reproduce it in an unmodified
postgres.

$ cat ~/tmp/txid.sql
SELECT txid_current();
$ for c in 1 2 4 8 16 32 64 128 256 512 768 1024 2048 4096; do echo -n "$c ";pgbench -n -M prepared -f ~/tmp/txid.sql
-c$c-j$c -T5 2>&1|grep '^tps'|awk '{print $3}';done
 
1        60174
2       116169
4       208119
8       373685
16      515247
32      554726
64      497508
128     415097
256     334923
512     243679
768     192959
1024    157734
2048     82904
4096     32007

(I didn't properly round TPS, but that doesn't matter here)


Performance completely falls off a cliff starting at ~256 clients. There's
actually plenty CPU available here, so this isn't a case of running out of
CPU time.

Rather, the problem is very bad contention on the "spinlock" for the lwlock
wait list. I realized that something in that direction was off when trying to
investigate why I was seeing spin delays of substantial duration (>100ms).

The problem isn't a fundamental issue with lwlocks, it's that
LWLockDequeueSelf() does this:

        LWLockWaitListLock(lock);

        /*
         * Can't just remove ourselves from the list, but we need to iterate over
         * all entries as somebody else could have dequeued us.
         */
        proclist_foreach_modify(iter, &lock->waiters, lwWaitLink)
        {
                if (iter.cur == MyProc->pgprocno)
                {
                        found = true;
                        proclist_delete(&lock->waiters, iter.cur, lwWaitLink);
                        break;
                }
        }

I.e. it iterates over the whole waitlist to "find itself". The longer the
waitlist gets, the longer this takes. And the longer it takes for
LWLockWakeup() to actually wake up all waiters, the more likely it becomes
that LWLockDequeueSelf() needs to be called.


We can't make the trivial optimization and use proclist_contains(), because
PGPROC->lwWaitLink is also used for the list of processes to wake up in
LWLockWakeup().

But I think we can solve that fairly reasonably nonetheless. We can change
PGPROC->lwWaiting to not just be a boolean, but have three states:
0: not waiting
1: waiting in waitlist
2: waiting to be woken up

which we then can use in LWLockDequeueSelf() to only remove ourselves from the
list if we're on it. As removal from that list is protected by the wait list
lock, there's no race to worry about.

client  patched   HEAD
1        60109    60174
2       112694   116169
4       214287   208119
8       377459   373685
16      524132   515247
32      565772   554726
64      587716   497508
128     581297   415097
256     550296   334923
512     486207   243679
768     449673   192959
1024    410836   157734
2048    326224    82904
4096    250252    32007

Not perfect with the patch, but not awful either.


I suspect this issue might actually explain quite a few odd performance
behaviours we've seen at the larger end in the past. I think it has gotten a
bit worse with the conversion of lwlock.c to proclists (I see lots of
expensive multiplications to deal with sizeof(PGPROC)), but otherwise likely
exists at least as far back as ab5194e6f61, in 9.5.

I guess there's an argument for considering this a bug that we should
backpatch a fix for? But given the vintage, probably not?  The only thing that
gives me pause is that this is quite hard to pinpoint as happening.


I've attached my quick-and-dirty patch. Obviously it'd need a few defines etc,
but I wanted to get this out to discuss before spending further time.

Greetings,

Andres Freund

Attachment

fix-lwlock-queue.diff

Re: heavily contended lwlocks with long wait queues scale badly

From

Kyotaro Horiguchi

Date:

31 October 2022, 05:32:55

At Thu, 27 Oct 2022 09:59:14 -0700, Andres Freund <andres@anarazel.de> wrote in 
> But I think we can solve that fairly reasonably nonetheless. We can change
> PGPROC->lwWaiting to not just be a boolean, but have three states:
> 0: not waiting
> 1: waiting in waitlist
> 2: waiting to be woken up
> 
> which we then can use in LWLockDequeueSelf() to only remove ourselves from the
> list if we're on it. As removal from that list is protected by the wait list
> lock, there's no race to worry about.

Since LWLockDequeueSelf() is defined to be called in some restricted
situation, there's no chance that the proc to remove is in another
lock waiters list at the time the function is called. So it seems to
work well. It is simple and requires no additional memory or cycles...

No. It enlarges PRPC by 8 bytes, but changing lwWaiting to int8/uint8
keeps the size as it is. (Rocky8/x86-64)

It just shaves off looping cycles. So +1 for what the patch does.


> client  patched   HEAD
> 1        60109    60174
> 2       112694   116169
> 4       214287   208119
> 8       377459   373685
> 16      524132   515247
> 32      565772   554726
> 64      587716   497508
> 128     581297   415097
> 256     550296   334923
> 512     486207   243679
> 768     449673   192959
> 1024    410836   157734
> 2048    326224    82904
> 4096    250252    32007
> 
> Not perfect with the patch, but not awful either.

Fairly good? Agreed.  The performance peak is improved by 6% and
shifted to larger number of clients (32->128).

> I suspect this issue might actually explain quite a few odd performance
> behaviours we've seen at the larger end in the past. I think it has gotten a
> bit worse with the conversion of lwlock.c to proclists (I see lots of
> expensive multiplications to deal with sizeof(PGPROC)), but otherwise likely
> exists at least as far back as ab5194e6f61, in 9.5.
>
> I guess there's an argument for considering this a bug that we should
> backpatch a fix for? But given the vintage, probably not?  The only thing that
> gives me pause is that this is quite hard to pinpoint as happening.

I don't think this is a bug but I think it might be back-patchable
since it doesn't change memory footprint (if adjusted), causes no
additional cost or interfarce breakage, thus it might be ok to
backpatch.  Since this "bug" has the nature of positive-feedback so
reducing the coefficient is benetifical than the direct cause of the
change.

> I've attached my quick-and-dirty patch. Obviously it'd need a few defines etc,
> but I wanted to get this out to discuss before spending further time.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: heavily contended lwlocks with long wait queues scale badly

From

Dilip Kumar

Date:

31 October 2022, 08:41:47

On Mon, Oct 31, 2022 at 11:03 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> At Thu, 27 Oct 2022 09:59:14 -0700, Andres Freund <andres@anarazel.de> wrote in
> > But I think we can solve that fairly reasonably nonetheless. We can change
> > PGPROC->lwWaiting to not just be a boolean, but have three states:
> > 0: not waiting
> > 1: waiting in waitlist
> > 2: waiting to be woken up
> >
> > which we then can use in LWLockDequeueSelf() to only remove ourselves from the
> > list if we're on it. As removal from that list is protected by the wait list
> > lock, there's no race to worry about.

This looks like a good idea.


> No. It enlarges PRPC by 8 bytes, but changing lwWaiting to int8/uint8
> keeps the size as it is. (Rocky8/x86-64)

I agree

> It just shaves off looping cycles. So +1 for what the patch does.
>
>
> > client  patched   HEAD
> > 1        60109    60174
> > 2       112694   116169
> > 4       214287   208119
> > 8       377459   373685
> > 16      524132   515247
> > 32      565772   554726
> > 64      587716   497508
> > 128     581297   415097
> > 256     550296   334923
> > 512     486207   243679
> > 768     449673   192959
> > 1024    410836   157734
> > 2048    326224    82904
> > 4096    250252    32007
> >
> > Not perfect with the patch, but not awful either.
>
> Fairly good? Agreed.  The performance peak is improved by 6% and
> shifted to larger number of clients (32->128).
>

The performance result is promising so +1

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: heavily contended lwlocks with long wait queues scale badly

From

Bharath Rupireddy

Date:

31 October 2022, 10:51:06

On Thu, Oct 27, 2022 at 10:29 PM Andres Freund <andres@anarazel.de> wrote:
>
> But I think we can solve that fairly reasonably nonetheless. We can change
> PGPROC->lwWaiting to not just be a boolean, but have three states:
> 0: not waiting
> 1: waiting in waitlist
> 2: waiting to be woken up
>
> which we then can use in LWLockDequeueSelf() to only remove ourselves from the
> list if we're on it. As removal from that list is protected by the wait list
> lock, there's no race to worry about.
>
> client  patched   HEAD
> 1        60109    60174
> 2       112694   116169
> 4       214287   208119
> 8       377459   373685
> 16      524132   515247
> 32      565772   554726
> 64      587716   497508
> 128     581297   415097
> 256     550296   334923
> 512     486207   243679
> 768     449673   192959
> 1024    410836   157734
> 2048    326224    82904
> 4096    250252    32007
>
> Not perfect with the patch, but not awful either.

Here are results from my testing [1]. Results look impressive with the
patch at a higher number of clients, for instance, on HEAD TPS with
1024 clients is 103587 whereas it is 248702 with the patch.

HEAD, run 1:
1 34534
2 72088
4 135249
8 213045
16 243507
32 304108
64 375148
128 390658
256 345503
512 284510
768 146417
1024 103587
2048 34702
4096 12450

HEAD, run 2:
1 34110
2 72403
4 134421
8 211263
16 241606
32 295198
64 353580
128 385147
256 341672
512 295001
768 142341
1024 97721
2048 30229
4096 13179

PATCHED, run 1:
1 34412
2 71733
4 139141
8 211526
16 241692
32 308198
64 406198
128 385643
256 338464
512 295559
768 272639
1024 248702
2048 191402
4096 112074

PATCHED, run 2:
1 34087
2 73567
4 135624
8 211901
16 242819
32 310534
64 352663
128 381780
256 342483
512 301968
768 272596
1024 251014
2048 184939
4096 108186

> I've attached my quick-and-dirty patch. Obviously it'd need a few defines etc,
> but I wanted to get this out to discuss before spending further time.

Just for the record, here are some review comments posted in the other
thread -
https://www.postgresql.org/message-id/CALj2ACXktNbG%3DK8Xi7PSqbofTZozavhaxjatVc14iYaLu4Maag%40mail.gmail.com..

BTW, I've seen a sporadic crash (SEGV) with the patch in bg writer
with the same set up [1], I'm not sure if it's really because of the
patch. I'm unable to reproduce it now and unfortunately I didn't
capture further details when it occurred.

[1] ./configure --prefix=$PWD/inst/ --enable-tap-tests CFLAGS="-O3" >
install.log && make -j 8 install > install.log 2>&1 &
shared_buffers = 8GB
max_wal_size = 32GB
max_connections = 4096
checkpoint_timeout = 10min

ubuntu: cat << EOF >> txid.sql
SELECT txid_current();
EOF
ubuntu: for c in 1 2 4 8 16 32 64 128 256 512 768 1024 2048 4096; do
echo -n "$c ";./pgbench -n -M prepared -U ubuntu postgres -f txid.sql
-c$c -j$c -T5 2>&1|grep '^tps'|awk '{print $3}';done

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: heavily contended lwlocks with long wait queues scale badly

From

Pavel Borisov

Date:

31 October 2022, 11:09:35

I was working on optimizing the LWLock queue in a little different way
and I also did a benchmarking of Andres' original patch from this
thread. [1]
The results are quite impressive, indeed. Please feel free to see the
results and join the discussion in [1] if you want.

Best regards,
Pavel

[1]
https://www.postgresql.org/message-id/flat/CALT9ZEEz%2B%3DNepc5eti6x531q64Z6%2BDxtP3h-h_8O5HDdtkJcPw%40mail.gmail.com

Re: heavily contended lwlocks with long wait queues scale badly

From

Alexander Korotkov

Date:

31 October 2022, 12:38:46

Hi Andres,

Thank you for your patch. The results are impressive.

On Mon, Oct 31, 2022 at 2:10 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
>
> I was working on optimizing the LWLock queue in a little different way
> and I also did a benchmarking of Andres' original patch from this
> thread. [1]
> The results are quite impressive, indeed. Please feel free to see the
> results and join the discussion in [1] if you want.
>
> Best regards,
> Pavel
>
> [1]
https://www.postgresql.org/message-id/flat/CALT9ZEEz%2B%3DNepc5eti6x531q64Z6%2BDxtP3h-h_8O5HDdtkJcPw%40mail.gmail.com

Pavel posted a patch implementing a lock-less queue for LWLock. The
results are interesting indeed, but slightly lower than your current
patch have. The current Pavel's patch probably doesn't utilize the
full potential of lock-less idea. I wonder what do you think about
this direction? We would be grateful for your guidance. Thank you.

------
Regards,
Alexander Korotkov

Re: heavily contended lwlocks with long wait queues scale badly

From

Robert Haas

Date:

31 October 2022, 18:40:28

On Thu, Oct 27, 2022 at 12:59 PM Andres Freund <andres@anarazel.de> wrote:
> After pulling my
> hair out for quite a while to try to understand that behaviour, I figured out
> that it's just a side-effect of *removing* some other contention.

I've seen this kind of pattern on multiple occasions. I don't know if
they were all caused by this, or what, but I certainly like the idea
of making it better.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: heavily contended lwlocks with long wait queues scale badly

From

Andres Freund

Date:

31 October 2022, 23:51:14

Hi,

On 2022-10-31 16:21:06 +0530, Bharath Rupireddy wrote:
> BTW, I've seen a sporadic crash (SEGV) with the patch in bg writer
> with the same set up [1], I'm not sure if it's really because of the
> patch. I'm unable to reproduce it now and unfortunately I didn't
> capture further details when it occurred.

That's likely because the prototype patch I submitted in this thread missed
updating LWLockUpdateVar().

Updated patch attached.

Greetings,

Andres Freund

Attachment

v3-0001-lwlock-fix-quadratic-behaviour-with-very-long-wai.patch

Re: heavily contended lwlocks with long wait queues scale badly

From

Zhihong Yu

Date:

01 November 2022, 00:17:03

On Mon, Oct 31, 2022 at 4:51 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2022-10-31 16:21:06 +0530, Bharath Rupireddy wrote:
> BTW, I've seen a sporadic crash (SEGV) with the patch in bg writer
> with the same set up [1], I'm not sure if it's really because of the
> patch. I'm unable to reproduce it now and unfortunately I didn't
> capture further details when it occurred.

That's likely because the prototype patch I submitted in this thread missed
updating LWLockUpdateVar().

Updated patch attached.

Greetings,

Andres Freund

Hi,

Minor comment:

+ uint8 lwWaiting; /* see LWLockWaitState */

Why not declare `lwWaiting` of type LWLockWaitState ?

Cheers

Re: heavily contended lwlocks with long wait queues scale badly

From

Andres Freund

Date:

01 November 2022, 00:19:34

Hi,

On 2022-10-31 17:17:03 -0700, Zhihong Yu wrote:
> On Mon, Oct 31, 2022 at 4:51 PM Andres Freund <andres@anarazel.de> wrote:
> 
> > Hi,
> >
> > On 2022-10-31 16:21:06 +0530, Bharath Rupireddy wrote:
> > > BTW, I've seen a sporadic crash (SEGV) with the patch in bg writer
> > > with the same set up [1], I'm not sure if it's really because of the
> > > patch. I'm unable to reproduce it now and unfortunately I didn't
> > > capture further details when it occurred.
> >
> > That's likely because the prototype patch I submitted in this thread missed
> > updating LWLockUpdateVar().
> >
> > Updated patch attached.
> >
> > Greetings,
> >
> > Andres Freund
> >
> 
> Hi,
> Minor comment:
> 
> +   uint8       lwWaiting;      /* see LWLockWaitState */
> 
> Why not declare `lwWaiting` of type LWLockWaitState ?

Unfortunately C99 (*) doesn't allow to specify the width of an enum
field. With most compilers we'd end up using 4 bytes.

Greetings,

Andres Freund

(*) C++ has allowed specifying this for quite a few years now and I think C23
will support it too, but that doesn't help us at this point.

Re: heavily contended lwlocks with long wait queues scale badly

From

Zhihong Yu

Date:

01 November 2022, 01:00:16

On Mon, Oct 31, 2022 at 5:19 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2022-10-31 17:17:03 -0700, Zhihong Yu wrote:
> On Mon, Oct 31, 2022 at 4:51 PM Andres Freund <andres@anarazel.de> wrote:
>
> > Hi,
> >
> > On 2022-10-31 16:21:06 +0530, Bharath Rupireddy wrote:
> > > BTW, I've seen a sporadic crash (SEGV) with the patch in bg writer
> > > with the same set up [1], I'm not sure if it's really because of the
> > > patch. I'm unable to reproduce it now and unfortunately I didn't
> > > capture further details when it occurred.
> >
> > That's likely because the prototype patch I submitted in this thread missed
> > updating LWLockUpdateVar().
> >
> > Updated patch attached.
> >
> > Greetings,
> >
> > Andres Freund
> >
>
> Hi,
> Minor comment:
>
> + uint8 lwWaiting; /* see LWLockWaitState */
>
> Why not declare `lwWaiting` of type LWLockWaitState ?

Unfortunately C99 (*) doesn't allow to specify the width of an enum
field. With most compilers we'd end up using 4 bytes.

Greetings,

Andres Freund

(*) C++ has allowed specifying this for quite a few years now and I think C23
will support it too, but that doesn't help us at this point.

Hi,

Thanks for the response.

If possible, it would be better to put your explanation in the code comment (so that other people know the reasoning).

Re: heavily contended lwlocks with long wait queues scale badly

From

Bharath Rupireddy

Date:

01 November 2022, 07:16:51

On Tue, Nov 1, 2022 at 5:21 AM Andres Freund <andres@anarazel.de> wrote:
>
> On 2022-10-31 16:21:06 +0530, Bharath Rupireddy wrote:
> > BTW, I've seen a sporadic crash (SEGV) with the patch in bg writer
> > with the same set up [1], I'm not sure if it's really because of the
> > patch. I'm unable to reproduce it now and unfortunately I didn't
> > capture further details when it occurred.
>
> That's likely because the prototype patch I submitted in this thread missed
> updating LWLockUpdateVar().
>
> Updated patch attached.

Thanks. It looks good to me. However, some minor comments on the v3 patch:

1.
-    if (MyProc->lwWaiting)
+    if (MyProc->lwWaiting != LW_WS_NOT_WAITING)
         elog(PANIC, "queueing for lock while waiting on another one");

Can the above condition be MyProc->lwWaiting == LW_WS_WAITING ||
MyProc->lwWaiting == LW_WS_PENDING_WAKEUP for better readability?

Or add an assertion Assert(MyProc->lwWaiting != LW_WS_WAITING &&
MyProc->lwWaiting != LW_WS_PENDING_WAKEUP); before setting
LW_WS_WAITING?

2.
    /* Awaken any waiters I removed from the queue. */
    proclist_foreach_modify(iter, &wakeup, lwWaitLink)
    {

@@ -1044,7 +1052,7 @@ LWLockWakeup(LWLock *lock)
          * another lock.
          */
         pg_write_barrier();
-        waiter->lwWaiting = false;
+        waiter->lwWaiting = LW_WS_NOT_WAITING;
         PGSemaphoreUnlock(waiter->sem);
     }

    /*
     * Awaken any waiters I removed from the queue.
     */
    proclist_foreach_modify(iter, &wakeup, lwWaitLink)
    {
        PGPROC       *waiter = GetPGProcByNumber(iter.cur);

        proclist_delete(&wakeup, iter.cur, lwWaitLink);
        /* check comment in LWLockWakeup() about this barrier */
        pg_write_barrier();
        waiter->lwWaiting = LW_WS_NOT_WAITING;

Can we add an assertion Assert(waiter->lwWaiting ==
LW_WS_PENDING_WAKEUP) in the above two places? We prepare the wakeup
list and set the LW_WS_NOT_WAITING flag in above loops, having an
assertion is better here IMO.

Below are test results with v3 patch. +1 for back-patching it.

    HEAD    PATCHED
1    34142    34289
2    72760    69720
4    136300    131848
8    210809    210192
16    240718    242744
32    297587    297354
64    341939    343036
128    383615    383801
256    342094    337680
512    263194    288629
768    145526    261553
1024    107267    241811
2048    35716    188389
4096    12415    120300

    PG15    PATCHED
1    34503    34078
2    73708    72054
4    139415    133321
8    212396    211390
16    242227    242584
32    303441    309288
64    362680    339211
128    378645    344291
256    340016    344291
512    290044    293337
768    140277    264618
1024    96191    247636
2048    35158    181488
4096    12164    118610

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: heavily contended lwlocks with long wait queues scale badly

From

Robert Haas

Date:

01 November 2022, 12:37:39

On Tue, Nov 1, 2022 at 3:17 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> Below are test results with v3 patch. +1 for back-patching it.

The problem with back-patching stuff like this is that it can have
unanticipated consequences. I think that the chances of something like
this backfiring are less than for a patch that changes plans, but I
don't think that they're nil, either. It could turn out that this
patch, which has really promising results on the workloads we've
tested, harms some other workload due to some other contention pattern
we can't foresee. It could also turn out that improving performance at
the database level actually has negative consequences for some
application using the database, because the application could be
unknowingly relying on the database to throttle its activity.

It's hard for me to estimate exactly what the risk of a patch like
this is. I think that if we back-patched this, and only this, perhaps
the chances of something bad happening aren't incredibly high. But if
we get into the habit of back-patching seemingly-innocuous performance
improvements, it's only a matter of time before one of them turns out
not to be so innocuous as we thought. I would guess that the number of
times we have to back-patch something like this before somebody starts
complaining about a regression is likely to be somewhere between 3 and
5.

It's possible that I'm too pessimistic, though.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: heavily contended lwlocks with long wait queues scale badly

From

"Jonathan S. Katz"

Date:

01 November 2022, 15:19:02

On 11/1/22 8:37 AM, Robert Haas wrote:
> On Tue, Nov 1, 2022 at 3:17 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
>> Below are test results with v3 patch. +1 for back-patching it.

First, awesome find and proposed solution!

> The problem with back-patching stuff like this is that it can have
> unanticipated consequences. I think that the chances of something like
> this backfiring are less than for a patch that changes plans, but I
> don't think that they're nil, either. It could turn out that this
> patch, which has really promising results on the workloads we've
> tested, harms some other workload due to some other contention pattern
> we can't foresee. It could also turn out that improving performance at
> the database level actually has negative consequences for some
> application using the database, because the application could be
> unknowingly relying on the database to throttle its activity.

If someone is using the database to throttle activity for their app, I 
have a bunch of follow up questions to understand why.

> It's hard for me to estimate exactly what the risk of a patch like
> this is. I think that if we back-patched this, and only this, perhaps
> the chances of something bad happening aren't incredibly high. But if
> we get into the habit of back-patching seemingly-innocuous performance
> improvements, it's only a matter of time before one of them turns out
> not to be so innocuous as we thought. I would guess that the number of
> times we have to back-patch something like this before somebody starts
> complaining about a regression is likely to be somewhere between 3 and
> 5.

Having the privilege of reading through the release notes for every 
update release, on average 1-2 "performance improvements" in each 
release. I believe they tend to be more negligible, though.

I do understand the concerns. Say you discover your workload does have a 
regression with this patch and then there's a CVE that you want to 
accept -- what do you do? Reading the thread / patch, it seems as if 
this is a lower risk "performance fix", but still nonzero.

While this does affect all supported versions, we could also consider 
backpatching only for PG15. That at least 1/ limits impact on users 
running older versions (opting into a major version upgrade) and 2/ 
we're still very early in the major upgrade cycle for PG15 that it's 
lower risk if there are issues.

Users are generally happy when they can perform a simple upgrade and get 
a performance boost, particularly the set of users that this patch 
affects most (high throughput, high connection count). This is the type 
of fix that would make headlines in a major release announcement (10x 
TPS improvement w/4096 connections?!). That is also part of the tradeoff 
of backpatching this, is that we may lose some of the higher visibility 
marketing opportunities to discuss this (though I'm sure there will be 
plenty of blog posts, etc.)

Andres: when you suggested backpatching, were you thinking of the Nov 
2022 release or the Feb 2023 release?

Thanks,

Jonathan

Attachment

OpenPGP_signature

Re: heavily contended lwlocks with long wait queues scale badly

From

Andres Freund

Date:

01 November 2022, 15:59:04

Hi,

On 2022-11-01 08:37:39 -0400, Robert Haas wrote:
> On Tue, Nov 1, 2022 at 3:17 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> > Below are test results with v3 patch. +1 for back-patching it.
> 
> The problem with back-patching stuff like this is that it can have
> unanticipated consequences. I think that the chances of something like
> this backfiring are less than for a patch that changes plans, but I
> don't think that they're nil, either. It could turn out that this
> patch, which has really promising results on the workloads we've
> tested, harms some other workload due to some other contention pattern
> we can't foresee. It could also turn out that improving performance at
> the database level actually has negative consequences for some
> application using the database, because the application could be
> unknowingly relying on the database to throttle its activity.
> 
> It's hard for me to estimate exactly what the risk of a patch like
> this is. I think that if we back-patched this, and only this, perhaps
> the chances of something bad happening aren't incredibly high. But if
> we get into the habit of back-patching seemingly-innocuous performance
> improvements, it's only a matter of time before one of them turns out
> not to be so innocuous as we thought. I would guess that the number of
> times we have to back-patch something like this before somebody starts
> complaining about a regression is likely to be somewhere between 3 and
> 5.

In general I agree, we shouldn't default to backpatching performance
fixes. The reason I am even considering it in this case, is that it's a
readily reproducible issue, leading to a quadratic behaviour that's extremely
hard to pinpoint. There's no increase in CPU usage, no wait event for
spinlocks, the system doesn't even get stuck (because the wait list lock is
held after the lwlock lock release). I don't think users have a decent chance
at figuring out that this is the issue.

I'm not at all convinced we should backpatch either, just to be clear.

Greetings,

Andres Freund

Re: heavily contended lwlocks with long wait queues scale badly

From

Andres Freund

Date:

01 November 2022, 17:41:23

Hi,

On 2022-11-01 11:19:02 -0400, Jonathan S. Katz wrote:
> This is the type of fix that would make headlines in a major release
> announcement (10x TPS improvement w/4096 connections?!). That is also part
> of the tradeoff of backpatching this, is that we may lose some of the higher
> visibility marketing opportunities to discuss this (though I'm sure there
> will be plenty of blog posts, etc.)

(read the next paragraph with the caveat that results below prove it somewhat
wrong)

I don't think the fix is as big a deal as the above make it sound - you need
to do somewhat extreme things to hit the problem. Yes, it drastically improves
the scalability of e.g. doing SELECT txid_current() across as many sessions as
possible - but that's not something you normally do (it was a good candidate
to show the problem because it's a single lock but doesn't trigger WAL flushes
at commit).

You can probably hit the problem with many concurrent single-tx INSERTs, but
you'd need to have synchronous_commit=off or fsync=off (or a very expensive
server class SSD with battery backup) and the effect is likely smaller.

> Andres: when you suggested backpatching, were you thinking of the Nov 2022
> release or the Feb 2023 release?

I wasn't thinking that concretely. Even if we decide to backpatch, I'd be very
hesitant to do it in a few days.

<goes and runs test while in meeting>

I tested with browser etc running, so this is plenty noisy. I used the best of
the two pgbench -T21 -P5 tps, after ignoring the first two periods (they're
too noisy). I used an ok-ish NVMe SSD, rather than the the expensive one that
has "free" fsync.

synchronous_commit=on:

clients   master         fix
16          6196            6202
64         25716           25545
256        90131           90240
1024      128556          151487
2048       59417          157050
4096       32252          178823

synchronous_commit=off:

clients   master         fix
16        409828      409016
64        454257          455804
256       304175          452160
1024      135081          334979
2048       66124          291582
4096       27019          245701

Hm. That's a bigger effect than I anticipated. I guess sc=off isn't actually
required, due to the level of concurrency making group commit very
effective.

This is without an index, serial column or anything. But a quick comparison
for just 4096 clients shows that to still be a big difference if I create an
serial primary key:
master: 26172
fix: 155813

Greetings,

Andres Freund

Re: heavily contended lwlocks with long wait queues scale badly

From

"Jonathan S. Katz"

Date:

03 November 2022, 18:21:18

On 11/1/22 1:41 PM, Andres Freund wrote:

>> Andres: when you suggested backpatching, were you thinking of the Nov 2022
>> release or the Feb 2023 release?
> 
> I wasn't thinking that concretely. Even if we decide to backpatch, I'd be very
> hesitant to do it in a few days.

Yeah this was my thinking (and also why I took a few days to reply given 
the lack of urgency for this release). It would at least give some more 
time for others to test it to feel confident that we're not introducing 
noticeable regressions.

> <goes and runs test while in meeting>
> 
> 
> I tested with browser etc running, so this is plenty noisy. I used the best of
> the two pgbench -T21 -P5 tps, after ignoring the first two periods (they're
> too noisy). I used an ok-ish NVMe SSD, rather than the the expensive one that
> has "free" fsync.
> 
> synchronous_commit=on:
> 
> clients   master         fix
> 16          6196            6202
> 64         25716           25545
> 256        90131           90240
> 1024      128556          151487
> 2048       59417          157050
> 4096       32252          178823
> 
> 
> synchronous_commit=off:
> 
> clients   master         fix
> 16        409828      409016
> 64        454257          455804
> 256       304175          452160
> 1024      135081          334979
> 2048       66124          291582
> 4096       27019          245701
> 
> 
> Hm. That's a bigger effect than I anticipated. I guess sc=off isn't actually
> required, due to the level of concurrency making group commit very
> effective.
> 
> This is without an index, serial column or anything. But a quick comparison
> for just 4096 clients shows that to still be a big difference if I create an
> serial primary key:
> master: 26172
> fix: 155813

🤯 (seeing if my exploding head makes it into the archives).

Given the lack of ABI changes (hesitant to say low-risk until after more 
testing, but seemingly low-risk), I can get behind backpatching esp if 
we're targeting Feb 2023 so we can tests some more.

With my advocacy hat on, it bums me that we may not get as much buzz 
about this change given it's not in a major release, but 1/ it'll fix an 
issue that will help users with high-concurrency and 2/ users would be 
able to perform a simpler update to get the change.

Thanks,

Jonathan

Attachment

OpenPGP_signature

Re: heavily contended lwlocks with long wait queues scale badly

From

Bharath Rupireddy

Date:

09 November 2022, 10:24:16

On Tue, Nov 1, 2022 at 12:46 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> > Updated patch attached.
>
> Thanks. It looks good to me. However, some minor comments on the v3 patch:
>
> 1.
> -    if (MyProc->lwWaiting)
> +    if (MyProc->lwWaiting != LW_WS_NOT_WAITING)
>          elog(PANIC, "queueing for lock while waiting on another one");
>
> Can the above condition be MyProc->lwWaiting == LW_WS_WAITING ||
> MyProc->lwWaiting == LW_WS_PENDING_WAKEUP for better readability?
>
> Or add an assertion Assert(MyProc->lwWaiting != LW_WS_WAITING &&
> MyProc->lwWaiting != LW_WS_PENDING_WAKEUP); before setting
> LW_WS_WAITING?
>
> 2.
>     /* Awaken any waiters I removed from the queue. */
>     proclist_foreach_modify(iter, &wakeup, lwWaitLink)
>     {
>
> @@ -1044,7 +1052,7 @@ LWLockWakeup(LWLock *lock)
>           * another lock.
>           */
>          pg_write_barrier();
> -        waiter->lwWaiting = false;
> +        waiter->lwWaiting = LW_WS_NOT_WAITING;
>          PGSemaphoreUnlock(waiter->sem);
>      }
>
>     /*
>      * Awaken any waiters I removed from the queue.
>      */
>     proclist_foreach_modify(iter, &wakeup, lwWaitLink)
>     {
>         PGPROC       *waiter = GetPGProcByNumber(iter.cur);
>
>         proclist_delete(&wakeup, iter.cur, lwWaitLink);
>         /* check comment in LWLockWakeup() about this barrier */
>         pg_write_barrier();
>         waiter->lwWaiting = LW_WS_NOT_WAITING;
>
> Can we add an assertion Assert(waiter->lwWaiting ==
> LW_WS_PENDING_WAKEUP) in the above two places? We prepare the wakeup
> list and set the LW_WS_NOT_WAITING flag in above loops, having an
> assertion is better here IMO.
>
> Below are test results with v3 patch. +1 for back-patching it.
>
>     HEAD    PATCHED
> 1    34142    34289
> 2    72760    69720
> 4    136300    131848
> 8    210809    210192
> 16    240718    242744
> 32    297587    297354
> 64    341939    343036
> 128    383615    383801
> 256    342094    337680
> 512    263194    288629
> 768    145526    261553
> 1024    107267    241811
> 2048    35716    188389
> 4096    12415    120300
>
>     PG15    PATCHED
> 1    34503    34078
> 2    73708    72054
> 4    139415    133321
> 8    212396    211390
> 16    242227    242584
> 32    303441    309288
> 64    362680    339211
> 128    378645    344291
> 256    340016    344291
> 512    290044    293337
> 768    140277    264618
> 1024    96191    247636
> 2048    35158    181488
> 4096    12164    118610

I looked at the v3 patch again today and ran some performance tests.
The results look impressive as they were earlier. Andres, any plans to
get this in?

pgbench with SELECT txid_current();:
Clients    HEAD    PATCHED
1    34613    33611
2    72634    70546
4    137885    136911
8    216470    216076
16    242535    245392
32    299952    304740
64    329788    347401
128    378296    386873
256    344939    343832
512    292196    295839
768    144212    260102
1024    101525    250263
2048    35594    185878
4096    11842    104227

pgbench with insert into pgbench_accounts table:
Clients    HEAD    PATCHED
1    1660    1600
2    1848    1746
4    3547    3395
8    7330    6754
16    13103    13613
32    26011    26372
64    52331    52594
128    93313    95526
256    127373    126182
512    126712    127857
768    116765    119227
1024    111464    112499
2048    58838    92756
4096    26066    60543

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: heavily contended lwlocks with long wait queues scale badly

From

Andres Freund

Date:

09 November 2022, 17:38:08

Hi,

On 2022-11-09 15:54:16 +0530, Bharath Rupireddy wrote:
> On Tue, Nov 1, 2022 at 12:46 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > > Updated patch attached.
> >
> > Thanks. It looks good to me. However, some minor comments on the v3 patch:
> >
> > 1.
> > -    if (MyProc->lwWaiting)
> > +    if (MyProc->lwWaiting != LW_WS_NOT_WAITING)
> >          elog(PANIC, "queueing for lock while waiting on another one");
> >
> > Can the above condition be MyProc->lwWaiting == LW_WS_WAITING ||
> > MyProc->lwWaiting == LW_WS_PENDING_WAKEUP for better readability?
> >
> > Or add an assertion Assert(MyProc->lwWaiting != LW_WS_WAITING &&
> > MyProc->lwWaiting != LW_WS_PENDING_WAKEUP); before setting
> > LW_WS_WAITING?

I don't think that's a good idea - it'll just mean we have to modify more
places if we add another state, without making anything more robust.


> > 2.
> >     /* Awaken any waiters I removed from the queue. */
> >     proclist_foreach_modify(iter, &wakeup, lwWaitLink)
> >     {
> >
> > @@ -1044,7 +1052,7 @@ LWLockWakeup(LWLock *lock)
> >           * another lock.
> >           */
> >          pg_write_barrier();
> > -        waiter->lwWaiting = false;
> > +        waiter->lwWaiting = LW_WS_NOT_WAITING;
> >          PGSemaphoreUnlock(waiter->sem);
> >      }
> >
> >     /*
> >      * Awaken any waiters I removed from the queue.
> >      */
> >     proclist_foreach_modify(iter, &wakeup, lwWaitLink)
> >     {
> >         PGPROC       *waiter = GetPGProcByNumber(iter.cur);
> >
> >         proclist_delete(&wakeup, iter.cur, lwWaitLink);
> >         /* check comment in LWLockWakeup() about this barrier */
> >         pg_write_barrier();
> >         waiter->lwWaiting = LW_WS_NOT_WAITING;
> >
> > Can we add an assertion Assert(waiter->lwWaiting ==
> > LW_WS_PENDING_WAKEUP) in the above two places? We prepare the wakeup
> > list and set the LW_WS_NOT_WAITING flag in above loops, having an
> > assertion is better here IMO.

I guess it can't hurt - but it's not really related to the changes in the
patch, no?


> I looked at the v3 patch again today and ran some performance tests.
> The results look impressive as they were earlier. Andres, any plans to
> get this in?

I definitely didn't want to backpatch before this point release. But it seems
we haven't quite got to an agreement what to do about backpatching. It's
probably best to just commit it to HEAD and let the backpatch discussion
happen concurrently.

I'm on a hike, without any connectivity, Thu afternoon - Sun. I think it's OK
to push it to HEAD if I get it done in the next few hours. Bigger issues,
which I do not expect, should show up before tomorrow afternoon. Smaller
things could wait till Sunday if necessary.

Greetings,

Andres Freund

Re: heavily contended lwlocks with long wait queues scale badly

From

Andres Freund

Date:

10 November 2022, 01:03:13

On 2022-11-09 09:38:08 -0800, Andres Freund wrote:
> I'm on a hike, without any connectivity, Thu afternoon - Sun. I think it's OK
> to push it to HEAD if I get it done in the next few hours. Bigger issues,
> which I do not expect, should show up before tomorrow afternoon. Smaller
> things could wait till Sunday if necessary.

I didn't get to it in time, so I'll leave it for when I'm back.

Re: heavily contended lwlocks with long wait queues scale badly

From

Andres Freund

Date:

20 November 2022, 19:56:20

Hi,

On 2022-11-09 17:03:13 -0800, Andres Freund wrote:
> On 2022-11-09 09:38:08 -0800, Andres Freund wrote:
> > I'm on a hike, without any connectivity, Thu afternoon - Sun. I think it's OK
> > to push it to HEAD if I get it done in the next few hours. Bigger issues,
> > which I do not expect, should show up before tomorrow afternoon. Smaller
> > things could wait till Sunday if necessary.
> 
> I didn't get to it in time, so I'll leave it for when I'm back.

Took a few days longer, partially because I encountered an independent issue
(see 8c954168cff) while testing.

I pushed it to HEAD now.

I still think it might be worth to backpatch in a bit, but so far the votes on
that weren't clear enough on that to feel comfortable.

Regards,

Andres

Re: heavily contended lwlocks with long wait queues scale badly

From

"Jonathan S. Katz"

Date:

21 November 2022, 15:31:14

On 11/20/22 2:56 PM, Andres Freund wrote:
> Hi,
> 
> On 2022-11-09 17:03:13 -0800, Andres Freund wrote:
>> On 2022-11-09 09:38:08 -0800, Andres Freund wrote:
>>> I'm on a hike, without any connectivity, Thu afternoon - Sun. I think it's OK
>>> to push it to HEAD if I get it done in the next few hours. Bigger issues,
>>> which I do not expect, should show up before tomorrow afternoon. Smaller
>>> things could wait till Sunday if necessary.
>>
>> I didn't get to it in time, so I'll leave it for when I'm back.
> 
> Took a few days longer, partially because I encountered an independent issue
> (see 8c954168cff) while testing.
> 
> I pushed it to HEAD now.

Thanks!

> I still think it might be worth to backpatch in a bit, but so far the votes on
> that weren't clear enough on that to feel comfortable.

My general feeling is "yes" on backpatching, particularly if this is a 
bug and it's fixable without ABI breaks.

My comments were around performing additional workload benchmarking just 
to ensure people feel comfortable that we're not introducing any 
performance regressions, and to consider the Feb 2023 release as the 
time to introduce this (vs. Nov 2022). That gives us ample time to 
determine if there are any performance regressions introduced.

Thanks,

Jonathan

Attachment

OpenPGP_signature

Re: heavily contended lwlocks with long wait queues scale badly

From

Nathan Bossart

Date:

11 January 2024, 03:17:47

On Mon, Nov 21, 2022 at 10:31:14AM -0500, Jonathan S. Katz wrote:
> On 11/20/22 2:56 PM, Andres Freund wrote:
>> I still think it might be worth to backpatch in a bit, but so far the votes on
>> that weren't clear enough on that to feel comfortable.
> 
> My general feeling is "yes" on backpatching, particularly if this is a bug
> and it's fixable without ABI breaks.

Now that commit a4adc31 has had some time to bake and concerns about
unintended consequences may have abated, I wanted to revive this
back-patching discussion.  I see a few possibly-related reports [0] [1]
[2], and I'm now seeing this in the field, too.  While it is debatable
whether this is a bug, it's a quite nasty issue for users, and it's both
difficult to detect and difficult to work around.

Thoughts?

[0] https://postgr.es/m/CAM527d-uDn5osa6QPKxHAC6srOfBH3M8iXUM%3DewqHV6n%3Dw1u8Q%40mail.gmail.com
[1] https://postgr.es/m/VI1PR05MB620666631A41186ACC3FC91ACFC70%40VI1PR05MB6206.eurprd05.prod.outlook.com
[2] https://postgr.es/m/dd0e070809430a31f7ddd8483fbcce59%40mail.gmail.com

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: heavily contended lwlocks with long wait queues scale badly

From

Michael Paquier

Date:

11 January 2024, 03:45:12

On Wed, Jan 10, 2024 at 09:17:47PM -0600, Nathan Bossart wrote:
> Now that commit a4adc31 has had some time to bake and concerns about
> unintended consequences may have abated, I wanted to revive this
> back-patching discussion.  I see a few possibly-related reports [0] [1]
> [2], and I'm now seeing this in the field, too.  While it is debatable
> whether this is a bug, it's a quite nasty issue for users, and it's both
> difficult to detect and difficult to work around.

+1, I've seen this becoming a PITA for a few things.  Knowing that the
size of PGPROC does not change at all, I would be in favor for a
backpatch, especially since it's been in the tree for more than 1
year, and even more knowing that we have 16 released with this stuff
in.
--
Michael

Attachment

signature.asc

Re: heavily contended lwlocks with long wait queues scale badly

From

"Jonathan S. Katz"

Date:

11 January 2024, 14:47:33

On 1/10/24 10:45 PM, Michael Paquier wrote:
> On Wed, Jan 10, 2024 at 09:17:47PM -0600, Nathan Bossart wrote:
>> Now that commit a4adc31 has had some time to bake and concerns about
>> unintended consequences may have abated, I wanted to revive this
>> back-patching discussion.  I see a few possibly-related reports [0] [1]
>> [2], and I'm now seeing this in the field, too.  While it is debatable
>> whether this is a bug, it's a quite nasty issue for users, and it's both
>> difficult to detect and difficult to work around.
> 
> +1, I've seen this becoming a PITA for a few things.  Knowing that the
> size of PGPROC does not change at all, I would be in favor for a
> backpatch, especially since it's been in the tree for more than 1
> year, and even more knowing that we have 16 released with this stuff
> in.

I have similar data sources to Nathan/Michael and I'm trying to avoid 
piling on, but one case that's interesting occurred after a major 
version upgrade from PG10 to PG14 on a database supporting a very 
active/highly concurrent workload. On inspection, it seems like 
backpatching would help this particularly case.

With 10/11 EOL, I do wonder if we'll see more of these reports on 
upgrade to < PG16.

(I was in favor of backpatching prior; opinion is unchanged).

Thanks,

Jonathan

Attachment

OpenPGP_signature.asc

Re: heavily contended lwlocks with long wait queues scale badly

From

Michael Paquier

Date:

16 January 2024, 06:11:48

On Thu, Jan 11, 2024 at 09:47:33AM -0500, Jonathan S. Katz wrote:
> I have similar data sources to Nathan/Michael and I'm trying to avoid piling
> on, but one case that's interesting occurred after a major version upgrade
> from PG10 to PG14 on a database supporting a very active/highly concurrent
> workload. On inspection, it seems like backpatching would help this
> particularly case.
>
> With 10/11 EOL, I do wonder if we'll see more of these reports on upgrade to
> < PG16.
>
> (I was in favor of backpatching prior; opinion is unchanged).

Hearing nothing, I have prepared a set of patches for v12~v15,
checking all the lwlock paths for all the branches.  At the end the
set of changes look rather sane to me regarding the queue handlings.

I have also run some numbers on all the branches, and the test case
posted upthread falls off dramatically after 512 concurrent
connections at the top of all the stable branches :(

For example on REL_12_STABLE with and without the patch attached:
num  v12           v12+patch
1    29717.151665  29096.707588
2    63257.709301  61889.476318
4    127921.873393 124575.901330
8    231400.571662 230562.725174
16   343911.185351 312432.897015
32   291748.985280 281011.787701
64   268998.728648 269975.605115
128  297332.597018 286449.176950
256  243902.817657 240559.122309
512  190069.602270 194510.718508
768  58915.650225  165714.707198
1024 39920.950552  149433.836901
2048 16922.391688  108164.301054
4096 6229.063321   69032.338708

I'd like to apply that, just let me know if you have any comments
and/or objections.
--
Michael

Attachment

Re: heavily contended lwlocks with long wait queues scale badly

From

"Jonathan S. Katz"

Date:

17 January 2024, 04:24:49

On 1/16/24 1:11 AM, Michael Paquier wrote:
> On Thu, Jan 11, 2024 at 09:47:33AM -0500, Jonathan S. Katz wrote:
>> I have similar data sources to Nathan/Michael and I'm trying to avoid piling
>> on, but one case that's interesting occurred after a major version upgrade
>> from PG10 to PG14 on a database supporting a very active/highly concurrent
>> workload. On inspection, it seems like backpatching would help this
>> particularly case.
>>
>> With 10/11 EOL, I do wonder if we'll see more of these reports on upgrade to
>> < PG16.
>>
>> (I was in favor of backpatching prior; opinion is unchanged).
> 
> Hearing nothing, I have prepared a set of patches for v12~v15,
> checking all the lwlock paths for all the branches.  At the end the
> set of changes look rather sane to me regarding the queue handlings.
> 
> I have also run some numbers on all the branches, and the test case
> posted upthread falls off dramatically after 512 concurrent
> connections at the top of all the stable branches :(
> 
> For example on REL_12_STABLE with and without the patch attached:
> num  v12           v12+patch
> 1    29717.151665  29096.707588
> 2    63257.709301  61889.476318
> 4    127921.873393 124575.901330
> 8    231400.571662 230562.725174
> 16   343911.185351 312432.897015
> 32   291748.985280 281011.787701
> 64   268998.728648 269975.605115
> 128  297332.597018 286449.176950
> 256  243902.817657 240559.122309
> 512  190069.602270 194510.718508
> 768  58915.650225  165714.707198
> 1024 39920.950552  149433.836901
> 2048 16922.391688  108164.301054
> 4096 6229.063321   69032.338708
> 
> I'd like to apply that, just let me know if you have any comments
> and/or objections.

Wow. All I can say is that my opinion remains unchanged on going forward 
with backpatching.

Looking at the code, I understand an argument for not backpatching given 
we modify the struct, but this does seem low-risk/high-reward and should 
help PostgreSQL to run better on this higher throughput workloads.

Thanks,

Jonathan

Attachment

OpenPGP_signature.asc

Re: heavily contended lwlocks with long wait queues scale badly

From

Michael Paquier

Date:

17 January 2024, 06:19:16

On Tue, Jan 16, 2024 at 11:24:49PM -0500, Jonathan S. Katz wrote:
> On 1/16/24 1:11 AM, Michael Paquier wrote:
>> I'd like to apply that, just let me know if you have any comments
>> and/or objections.
>
> Looking at the code, I understand an argument for not backpatching given we
> modify the struct, but this does seem low-risk/high-reward and should help
> PostgreSQL to run better on this higher throughput workloads.

Just to be clear here.  I have repeated tests on all the stable
branches yesterday, and the TPS falls off drastically around 256
concurrent sessions for all of them with patterns similar to what I've
posted for 12, getting back a lot of performance for the cases with
more than 1k connections.
--
Michael

Attachment

signature.asc

Re: heavily contended lwlocks with long wait queues scale badly

From

Michael Paquier

Date:

18 January 2024, 06:17:26

On Tue, Jan 16, 2024 at 03:11:48PM +0900, Michael Paquier wrote:
> I'd like to apply that, just let me know if you have any comments
> and/or objections.

And done on 12~15.

While on it, I have also looked at source code references on github
and debian that involve lwWaiting, and all of them rely on lwWaiting
when not waiting, making LW_WS_NOT_WAITING an equivalent.
--
Michael

Attachment

signature.asc

Re: heavily contended lwlocks with long wait queues scale badly

From

Jakub Wartak

Date:

19 January 2024, 12:49:59

On Thu, Jan 18, 2024 at 7:17 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, Jan 16, 2024 at 03:11:48PM +0900, Michael Paquier wrote:
> > I'd like to apply that, just let me know if you have any comments
> > and/or objections.
>
> And done on 12~15.

Hi Michael, just to reassure you that it is a good thing. We have a
customer who reported much better performance on 16.x than on 13~15 in
very heavy duty LWLock/lockmanager scenarios (ofc, before that was
committed/released), so I gave it a try here today to see how much can
be attributed to that single commit.

Given:
# $s=10, $p=10,100, DURATION=10s, m=prepared,simple, no reruns , just
single $DURATION run to save time
pgbench -i -s $s --partitions $p $DBNAME
ALTER TABLE pgbench_accounts ADD COLUMN aid_parent INT;
UPDATE pgbench_accounts SET aid_parent = aid
CREATE INDEX ON pgbench_accounts(aid_parent)

pgbench -n -M $m -T $DURATION -c $c -j $c -f join.sql $DBNAME

join.sql was:
\set aid random(1, 100000 * :scale)
select * from pgbench_accounts pa join pgbench_branches pb on pa.bid =
pb.bid where pa.aid_parent = :aid;

see attached results.The benefits are observable (at least when active
working sessions >= VCPUs [threads not cores]) and give up to ~2.65x
boost in certain cases at least for this testcase. Hopefully others
will find it useful.

-J.

Attachment

Re: heavily contended lwlocks with long wait queues scale badly

From

Michael Paquier

Date:

22 January 2024, 08:38:13

On Fri, Jan 19, 2024 at 01:49:59PM +0100, Jakub Wartak wrote:
> Hi Michael, just to reassure you that it is a good thing. We have a
> customer who reported much better performance on 16.x than on 13~15 in
> very heavy duty LWLock/lockmanager scenarios (ofc, before that was
> committed/released), so I gave it a try here today to see how much can
> be attributed to that single commit.

Ahh.  Thanks for the feedback.
--
Michael

Attachment

signature.asc