Re: Improving connection scalability (src/backend/storage/ipc/procarray.c) - Mailing list pgsql-hackers

From Ranier Vilela
Subject Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)
Date
Msg-id CAEudQArDKoDAD8i4=5Hq=Y7aD8+xMMxkhe07ocLJHFZGVQKKkw@mail.gmail.com
Whole thread Raw
In response to Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)
List pgsql-hackers
Em sáb., 28 de mai. de 2022 às 09:35, Tomas Vondra <tomas.vondra@enterprisedb.com> escreveu:
On 5/28/22 02:36, Ranier Vilela wrote:
> Em sex., 27 de mai. de 2022 às 18:22, Andres Freund <andres@anarazel.de
> <mailto:andres@anarazel.de>> escreveu:
>
>     Hi,
>
>     On 2022-05-27 10:35:08 -0300, Ranier Vilela wrote:
>     > Em qui., 26 de mai. de 2022 às 22:30, Tomas Vondra <
>     > tomas.vondra@enterprisedb.com
>     <mailto:tomas.vondra@enterprisedb.com>> escreveu:
>     >
>     > > On 5/27/22 02:11, Ranier Vilela wrote:
>     > > >
>     > > > ...
>     > > >
>     > > > Here the results with -T 60:
>     > >
>     > > Might be a good idea to share your analysis / interpretation of the
>     > > results, not just the raw data. After all, the change is being
>     proposed
>     > > by you, so do you think this shows the change is beneficial?
>     > >
>     > I think so, but the expectation has diminished.
>     > I expected that the more connections, the better the performance.
>     > And for both patch and head, this doesn't happen in tests.
>     > Performance degrades with a greater number of connections.
>
>     Your system has four CPUs. Once they're all busy, adding more
>     connections
>     won't improve performance. It'll just add more and more context
>     switching,
>     cache misses, and make the OS scheduler do more work.
>
> conns       tps head
> 10      82365.634750
> 50      74593.714180
> 80      69219.756038
> 90      67419.574189
> 100     66613.771701
> Yes it is quite disappointing that with 100 connections, tps loses to 10
> connections.
>  

IMO that's entirely expected on a system with only 4 cores. Increasing
the number of connections inevitably means more overhead (you have to
track/manage more stuff). And at some point the backends start competing
for L2/L3 caches, context switches are not free either, etc. So once you
cross ~2-3x the number of cores, you should expect this.

This behavior is natural/inherent, it's unlikely to go away, and it's
one of the reasons why we recommend not to use too many connections. If
you try to maximize throughput, just don't do that. Or just use machine
with more cores.

>
>     > GetSnapShowData() isn't a bottleneck?
>
>     I'd be surprised if it showed up in a profile on your machine with that
>     workload in any sort of meaningful way. The snapshot reuse logic
>     will always
>     work - because there are no writes - and thus the only work that
>     needs to be
>     done is to acquire the ProcArrayLock briefly.  And because there is
>     only a
>     small number of cores, contention on the cacheline for that isn't a
>     problem.
>
> Thanks for sharing this.
>  
>
>
>
>     > > These results look much saner, but IMHO it also does not show
>     any clear
>     > > benefit of the patch. Or are you still claiming there is a benefit?
>     > >
>     > We agree that they are micro-optimizations.  However, I think they
>     should be
>     > considered micro-optimizations in inner loops, because all in
>     procarray.c is
>     > a hotpath.
>
>     As explained earlier, I don't agree that they optimize anything - you're
>     making some of the scalability behaviour *worse*, if it's changed at
>     all.
>
>
>     > The first objective, I believe, was achieved, with no performance
>     > regression.
>     > I agree, the gains are small, by the tests done.
>
>     There are no gains.
>
> IMHO, I must disagree.
>  

You don't have to, really. What you should do is showing results
demonstrating the claimed gains, and so far you have not done that.

I don't want to be rude, but so far you've shown results from a
benchmark testing fork(), due to only running 10 transactions per
client, and then results from a single run for each client count (which
doesn't really show any gains either, and is so noisy).

As mentioned GetSnapshotData() is not even in perf profile, so why would
the patch even make a difference?

You've also claimed it helps generating better code on older compilers,
but you've never supported that with any evidence.


Maybe there is an improvement - show us. Do a benchmark with more runs,
to average-out the noise. Calculate VAR/STDEV to show how variable the
results are. Use that to compare results and decide if there is an
improvement. Also, keep in mind binary layout matters [1].
I redid the benchmark with a better machine:
Intel i7-10510U
RAM 8GB
SSD 512GB
Linux Ubuntu 64 bits

All files are attached, including the raw data of the results.
I did the calculations as requested.
But a quick average of the 10 benchmarks, done resulted in 10,000 tps more.
Not bad, for a simple patch, made entirely of micro-optimizations.

Results attached.
 
regards,
Ranier Vilela
Attachment

pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)
Next
From: Andres Freund
Date:
Subject: Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)