Re: Improving connection scalability: GetSnapshotData() - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Improving connection scalability: GetSnapshotData() |
Date | |
Msg-id | 20200904185304.bs27ufejpujp5azx@alap3.anarazel.de Whole thread Raw |
In response to | Re: Improving connection scalability: GetSnapshotData() (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>) |
Responses |
Re: Improving connection scalability: GetSnapshotData()
Re: Improving connection scalability: GetSnapshotData() Re: Improving connection scalability: GetSnapshotData() |
List | pgsql-hackers |
Hi, On 2020-09-04 18:24:12 +0300, Konstantin Knizhnik wrote: > Reported results looks very impressive. > But I tried to reproduce them and didn't observed similar behavior. > So I am wondering what can be the difference and what I am doing wrong. That is odd - I did reproduce it on quite a few systems by now. > Configuration file has the following differences with default postgres config: > > max_connections = 10000 # (change requires restart) > shared_buffers = 8GB # min 128kB I also used huge_pages=on / configured them on the OS level. Otherwise TLB misses will be a significant factor. Does it change if you initialize the test database using PGOPTIONS='-c vacuum_freeze_min_age=0' pgbench -i -s 100 or run a manual VACUUM FREEZE; after initialization? > I have tried two different systems. > First one is IBM Power2 server with 384 cores and 8Tb of RAM. > I run the same read-only pgbench test as you. I do not think that size of the database is matter, so I used scale 100 - > it seems to be enough to avoid frequent buffer conflicts. > Then I run the same scripts as you: > > for ((n=100; n < 1000; n+=100)); do echo $n; pgbench -M prepared -c $n -T 100 -j $n -M prepared -S -n postgres ; done > for ((n=1000; n <= 5000; n+=1000)); do echo $n; pgbench -M prepared -c $n -T 100 -j $n -M prepared -S -n postgres ; done > > > I have compared current master with version of Postgres prior to your commits with scalability improvements: a9a4a7ad56 Hm, it'd probably be good to compare commits closer to the changes, to avoid other changes showing up. Hm - did you verify if all the connections were actually established? Particularly without the patch applied? With an unmodified pgbench, I sometimes saw better numbers, but only because only half the connections were able to be established, due to ProcArrayLock contention. See https://www.postgresql.org/message-id/20200227180100.zyvjwzcpiokfsqm2%40alap3.anarazel.de There also is the issue that pgbench numbers for inclusive/exclusive are just about meaningless right now: https://www.postgresql.org/message-id/20200227202636.qaf7o6qcajsudoor%40alap3.anarazel.de (reminds me, need to get that fixed) One more thing worth investigating is whether your results change significantly when you start the server using numactl --interleave=all <start_server_cmdline>. Especially on larger systems the results otherwise can vary a lot from run-to-run, because the placement of shared buffers matters a lot. > So I have repeated experiments at Intel server. > It has 160 cores Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz and 256Gb of RAM. > > The same database, the same script, results are the following: > > Clients old/inc old/exl new/inc new/exl > 1000 1105750 1163292 1206105 1212701 > 2000 1050933 1124688 1149706 1164942 > 3000 1063667 1195158 1118087 1144216 > 4000 1040065 1290432 1107348 1163906 > 5000 943813 1258643 1103790 1160251 > I have separately show results including/excluding connection connections establishing, > because in new version there are almost no differences between them, > but for old version gap between them is noticeable. > > Configuration file has the following differences with default postgres config: > > max_connections = 10000 # (change requires restart) > shared_buffers = 8GB # min 128kB > > This results contradict with yours and makes me ask the following questions: > 1. Why in your case performance is almost two times larger (2 millions vs 1)? > The hardware in my case seems to be at least not worser than yours... > May be there are some other improvements in the version you have tested which are not yet committed to master? No, no uncommitted changes, except for the pgbench stuff mentioned above. However I found that the kernel version matters a fair bit, it's pretty easy to run into kernel scalability issues in a workload that is this heavy scheduler dependent. Did you connect via tcp or unix socket? Was pgbench running on the same machine? It was locally via unix socket for me (but it's also observable via two machines, just with lower overall throughput). Did you run a profile to see where the bottleneck is? There's a seperate benchmark that I found to be quite revealing that's far less dependent on scheduler behaviour. Run two pgbench instances: 1) With a very simply script '\sleep 1s' or such, and many connections (e.g. 100,1000,5000). That's to simulate connections that are currently idle. 2) With a normal pgbench read only script, and low client counts. Before the changes 2) shows a very sharp decline in performance when the count in 1) increases. Afterwards its pretty much linear. I think this benchmark actually is much more real world oriented - due to latency and client side overheads it's very normal to have a large fraction of connections idle in read mostly OLTP workloads. Here's the result on my workstation (2x Xeon Gold 5215 CPUs), testing 1f42d35a1d6144a23602b2c0bc7f97f3046cf890 against 07f32fcd23ac81898ed47f88beb569c631a2f223 which are the commits pre/post connection scalability changes. I used fairly short pgbench runs (15s), and the numbers are the best of three runs. I also had emacs and mutt open - some noise to be expected. But I also gotta work ;) | Idle Connections | Active Connections | TPS pre | TPS post | |-----------------:|-------------------:|--------:|---------:| | 0 | 1 | 33599 | 33406 | | 100 | 1 | 31088 | 33279 | | 1000 | 1 | 29377 | 33434 | | 2500 | 1 | 27050 | 33149 | | 5000 | 1 | 21895 | 33903 | | 10000 | 1 | 16034 | 33140 | | 0 | 48 | 1042005 | 1125104 | | 100 | 48 | 986731 | 1103584 | | 1000 | 48 | 854230 | 1119043 | | 2500 | 48 | 716624 | 1119353 | | 5000 | 48 | 553657 | 1119476 | | 10000 | 48 | 369845 | 1115740 | And a second version of this, where the idle connections are just less busy, using the following script: \sleep 100ms SELECT 1; | Mostly Idle Connections | Active Connections | TPS pre | TPS post | |------------------------:|-------------------:|--------:|---------------:| | 0 | 1 | 33837 | 34095.891429 | | 100 | 1 | 30622 | 31166.767491 | | 1000 | 1 | 25523 | 28829.313249 | | 2500 | 1 | 19260 | 24978.878822 | | 5000 | 1 | 11171 | 24208.146408 | | 10000 | 1 | 6702 | 29577.517084 | | 0 | 48 | 1022721 | 1133153.772338 | | 100 | 48 | 980705 | 1034235.255883 | | 1000 | 48 | 824668 | 1115965.638395 | | 2500 | 48 | 698510 | 1073280.930789 | | 5000 | 48 | 478535 | 1041931.158287 | | 10000 | 48 | 276042 | 953567.038634 | It's probably worth to call out that in the second test run here the run-to-run variability is huge. Presumably because it's very scheduler dependent much CPU time "active" backends and the "active" pgbench gets at higher "mostly idle" connection counts. > 2. You wrote: This is on a machine with 2 > Intel(R) Xeon(R) Platinum 8168, but virtualized (2 sockets of 18 cores/36 threads) > > According to Intel specification Intel® Xeon® Platinum 8168 Processor has 24 cores: > https://ark.intel.com/content/www/us/en/ark/products/120504/intel-xeon-platinum-8168-processor-33m-cache-2-70-ghz.html > > And at your graph we can see almost linear increase of speed up to 40 connections. > > But most suspicious word for me is "virtualized". What is the actual hardware and how it is virtualized? That was on an azure Fs72v2. I think that's hyperv virtualized, with all the "lost" cores dedicated to the hypervisor. But I did reproduce the speedups on my unvirtualized workstation (2x Xeon Gold 5215 CPUs) - the ceiling is lower, obviously. > May be it is because of more complex architecture of my server? Think we'll need profiles to know... Greetings, Andres Freund
pgsql-hackers by date: