Re: Improving connection scalability: GetSnapshotData() - Mailing list pgsql-hackers
From | Konstantin Knizhnik |
---|---|
Subject | Re: Improving connection scalability: GetSnapshotData() |
Date | |
Msg-id | 128c7844-92c0-b7fa-caff-ebd1499f30c6@postgrespro.ru Whole thread Raw |
In response to | Re: Improving connection scalability: GetSnapshotData() (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Improving connection scalability: GetSnapshotData()
|
List | pgsql-hackers |
On 04.09.2020 21:53, Andres Freund wrote: > > I also used huge_pages=on / configured them on the OS level. Otherwise > TLB misses will be a significant factor. As far as I understand there should not be no any TLB misses because size of the shared buffers (8Mb) as several order of magnitude smaler that available physical memory. > > Does it change if you initialize the test database using > PGOPTIONS='-c vacuum_freeze_min_age=0' pgbench -i -s 100 > or run a manual VACUUM FREEZE; after initialization? I tried it, but didn't see any improvement. > > Hm, it'd probably be good to compare commits closer to the changes, to > avoid other changes showing up. > > Hm - did you verify if all the connections were actually established? > Particularly without the patch applied? With an unmodified pgbench, I > sometimes saw better numbers, but only because only half the connections > were able to be established, due to ProcArrayLock contention. Yes, that really happen quite often at IBM Power2 server (specific of it's atomic implementation). I even have to patch pgbench by adding one second delay after connection has been established to make it possible for all clients to connect. But at Intel server I didn't see unconnected clients. And in any case - it happen only for large number of connections (> 1000). But the best performance was achieved at about 100 connections and still I can not reach 2k TPS performance a in your case. > Did you connect via tcp or unix socket? Was pgbench running on the same > machine? It was locally via unix socket for me (but it's also observable > via two machines, just with lower overall throughput). Pgbench was launched at the same machine and connected through unix sockets. > Did you run a profile to see where the bottleneck is? Sorry I do not have root privileges at this server and so can not use perf. > > There's a seperate benchmark that I found to be quite revealing that's > far less dependent on scheduler behaviour. Run two pgbench instances: > > 1) With a very simply script '\sleep 1s' or such, and many connections > (e.g. 100,1000,5000). That's to simulate connections that are > currently idle. > 2) With a normal pgbench read only script, and low client counts. > > Before the changes 2) shows a very sharp decline in performance when the > count in 1) increases. Afterwards its pretty much linear. > > I think this benchmark actually is much more real world oriented - due > to latency and client side overheads it's very normal to have a large > fraction of connections idle in read mostly OLTP workloads. > > Here's the result on my workstation (2x Xeon Gold 5215 CPUs), testing > 1f42d35a1d6144a23602b2c0bc7f97f3046cf890 against > 07f32fcd23ac81898ed47f88beb569c631a2f223 which are the commits pre/post > connection scalability changes. > > I used fairly short pgbench runs (15s), and the numbers are the best of > three runs. I also had emacs and mutt open - some noise to be > expected. But I also gotta work ;) > > | Idle Connections | Active Connections | TPS pre | TPS post | > |-----------------:|-------------------:|--------:|---------:| > | 0 | 1 | 33599 | 33406 | > | 100 | 1 | 31088 | 33279 | > | 1000 | 1 | 29377 | 33434 | > | 2500 | 1 | 27050 | 33149 | > | 5000 | 1 | 21895 | 33903 | > | 10000 | 1 | 16034 | 33140 | > | 0 | 48 | 1042005 | 1125104 | > | 100 | 48 | 986731 | 1103584 | > | 1000 | 48 | 854230 | 1119043 | > | 2500 | 48 | 716624 | 1119353 | > | 5000 | 48 | 553657 | 1119476 | > | 10000 | 48 | 369845 | 1115740 | Yes, there is also noticeable difference in my case | Idle Connections | Active Connections | TPS pre | TPS post | |-----------------:|-------------------:|--------:|---------:| | 5000 | 48 | 758914 | 1184085 | > Think we'll need profiles to know... I will try to obtain sudo permissions and do profiling.
pgsql-hackers by date: