Re: Improving connection scalability: GetSnapshotData() - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject Re: Improving connection scalability: GetSnapshotData()
Date
Msg-id 128c7844-92c0-b7fa-caff-ebd1499f30c6@postgrespro.ru
Whole thread Raw
In response to Re: Improving connection scalability: GetSnapshotData()  (Andres Freund <andres@anarazel.de>)
Responses Re: Improving connection scalability: GetSnapshotData()
List pgsql-hackers

On 04.09.2020 21:53, Andres Freund wrote:
>
> I also used huge_pages=on / configured them on the OS level. Otherwise
> TLB misses will be a significant factor.

As far as I understand there should not be no any TLB misses because 
size of the shared buffers (8Mb) as several order of magnitude smaler 
that available physical memory.
>
> Does it change if you initialize the test database using
> PGOPTIONS='-c vacuum_freeze_min_age=0' pgbench -i -s 100
> or run a manual VACUUM FREEZE; after initialization?
I tried it, but didn't see any improvement.

>
> Hm, it'd probably be good to compare commits closer to the changes, to
> avoid other changes showing up.
>
> Hm - did you verify if all the connections were actually established?
> Particularly without the patch applied? With an unmodified pgbench, I
> sometimes saw better numbers, but only because only half the connections
> were able to be established, due to ProcArrayLock contention.
Yes, that really happen quite often at IBM Power2 server (specific of 
it's atomic implementation).
I even have to patch pgbench  by adding one second delay after 
connection has been established to make it possible  for all clients to 
connect.
But at Intel server I didn't see unconnected clients. And in any case - 
it happen only for large number of connections (> 1000).
But the best performance was achieved at about 100 connections and still 
I can not reach 2k TPS performance a in your case.

> Did you connect via tcp or unix socket? Was pgbench running on the same
> machine? It was locally via unix socket for me (but it's also observable
> via two machines, just with lower overall throughput).

Pgbench was launched at the same machine and connected through unix sockets.

> Did you run a profile to see where the bottleneck is?
Sorry I do not have root privileges at this server and so can not use perf.
>
> There's a seperate benchmark that I found to be quite revealing that's
> far less dependent on scheduler behaviour. Run two pgbench instances:
>
> 1) With a very simply script '\sleep 1s' or such, and many connections
>     (e.g. 100,1000,5000). That's to simulate connections that are
>     currently idle.
> 2) With a normal pgbench read only script, and low client counts.
>
> Before the changes 2) shows a very sharp decline in performance when the
> count in 1) increases. Afterwards its pretty much linear.
>
> I think this benchmark actually is much more real world oriented - due
> to latency and client side overheads it's very normal to have a large
> fraction of connections idle in read mostly OLTP workloads.
>
> Here's the result on my workstation (2x Xeon Gold 5215 CPUs), testing
> 1f42d35a1d6144a23602b2c0bc7f97f3046cf890 against
> 07f32fcd23ac81898ed47f88beb569c631a2f223 which are the commits pre/post
> connection scalability changes.
>
> I used fairly short pgbench runs (15s), and the numbers are the best of
> three runs. I also had emacs and mutt open - some noise to be
> expected. But I also gotta work ;)
>
> | Idle Connections | Active Connections | TPS pre | TPS post |
> |-----------------:|-------------------:|--------:|---------:|
> |                0 |                  1 |   33599 |    33406 |
> |              100 |                  1 |   31088 |    33279 |
> |             1000 |                  1 |   29377 |    33434 |
> |             2500 |                  1 |   27050 |    33149 |
> |             5000 |                  1 |   21895 |    33903 |
> |            10000 |                  1 |   16034 |    33140 |
> |                0 |                 48 | 1042005 |  1125104 |
> |              100 |                 48 |  986731 |  1103584 |
> |             1000 |                 48 |  854230 |  1119043 |
> |             2500 |                 48 |  716624 |  1119353 |
> |             5000 |                 48 |  553657 |  1119476 |
> |            10000 |                 48 |  369845 |  1115740 |

Yes, there is also noticeable difference in my case

| Idle Connections | Active Connections | TPS pre | TPS post |
|-----------------:|-------------------:|--------:|---------:|
|             5000 |                 48 |  758914 |  1184085 |

> Think we'll need profiles to know...

I will try to obtain sudo permissions and do profiling.



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Switch to multi-inserts for pg_depend
Next
From: Tom Lane
Date:
Subject: Re: Use for name of unnamed portal's memory context