Re: Improving connection scalability: GetSnapshotData() - Mailing list pgsql-hackers
From | Konstantin Knizhnik |
---|---|
Subject | Re: Improving connection scalability: GetSnapshotData() |
Date | |
Msg-id | 4f245382-2f04-3b2e-ae94-d075d2eb7868@postgrespro.ru Whole thread Raw |
In response to | Re: Improving connection scalability: GetSnapshotData() (Michael Paquier <michael@paquier.xyz>) |
Responses |
Re: Improving connection scalability: GetSnapshotData()
|
List | pgsql-hackers |
On 03.09.2020 11:18, Michael Paquier wrote:
On Sun, Aug 16, 2020 at 02:26:57PM -0700, Andres Freund wrote:So we get some builfarm results while thinking about this.Andres, there is an entry in the CF for this thread: https://commitfest.postgresql.org/29/2500/ A lot of work has been committed with 623a9ba, 73487a6, 5788e25, etc. Now that PGXACT is done, how much work is remaining here? -- Michael
Andres, First of all a lot of thanks for this work. Improving Postgres connection scalability is very important. Reported results looks very impressive. But I tried to reproduce them and didn't observed similar behavior. So I am wondering what can be the difference and what I am doing wrong. I have tried two different systems. First one is IBM Power2 server with 384 cores and 8Tb of RAM. I run the same read-only pgbench test as you. I do not think that size of the database is matter, so I used scale 100 - it seems to be enough to avoid frequent buffer conflicts. Then I run the same scripts as you: for ((n=100; n < 1000; n+=100)); do echo $n; pgbench -M prepared -c $n -T 100 -j $n -M prepared -S -n postgres ; done for ((n=1000; n <= 5000; n+=1000)); do echo $n; pgbench -M prepared -c $n -T 100 -j $n -M prepared -S -n postgres ; done I have compared current master with version of Postgres prior to your commits with scalability improvements: a9a4a7ad56 For all number of connections older version shows slightly better results, for example for 500 clients: 475k TPS vs. 450k TPS for current master. This is quite exotic server and I do not have currently access to it. So I have repeated experiments at Intel server. It has 160 cores Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz and 256Gb of RAM. The same database, the same script, results are the following:
Clients | old/inc | old/exl | new/inc | new/exl |
---|---|---|---|---|
1000 | 1105750 | 1163292 | 1206105 | 1212701 |
2000 | 1050933 | 1124688 | 1149706 | 1164942 |
3000 | 1063667 | 1195158 | 1118087 | 1144216 |
4000 | 1040065 | 1290432 | 1107348 | 1163906 |
5000 | 943813 | 1258643 | 1103790 | 1160251 |
I have separately show results including/excluding connection connections establishing, because in new version there are almost no differences between them, but for old version gap between them is noticeable. Configuration file has the following differences with default postgres config: max_connections = 10000 # (change requires restart) shared_buffers = 8GB # min 128kB This results contradict with yours and makes me ask the following questions: 1. Why in your case performance is almost two times larger (2 millions vs 1)? The hardware in my case seems to be at least not worser than yours... May be there are some other improvements in the version you have tested which are not yet committed to master? 2. You wrote: This is on a machine with 2 Intel(R) Xeon(R) Platinum 8168, but virtualized (2 sockets of 18 cores/36 threads) According to Intel specification Intel® Xeon® Platinum 8168 Processor has 24 cores: https://ark.intel.com/content/www/us/en/ark/products/120504/intel-xeon-platinum-8168-processor-33m-cache-2-70-ghz.html And at your graph we can see almost linear increase of speed up to 40 connections. But most suspicious word for me is "virtualized". What is the actual hardware and how it is virtualized? Do you have any idea why in my case master version (with your commits) behaves almost the same as non-patched version? Below is yet another table showing scalability from 10 to 100 connections and combining your results (first two columns) and my results (last two columns):
Clients | old master | pgxact-split-cache | current master | revision 9a4a7ad56 |
---|---|---|---|---|
10 | 367883 | 375682 | 358984 | 347067 |
20 | 748000 | 810964 | 668631 | 630304 |
30 | 999231 | 1288276 | 920255 | 848244 |
40 | 991672 | 1573310 | 1100745 | 970717 |
50 | 1017561 | 1715762 | 1193928 | 1008755 |
60 | 993943 | 1789698 | 1255629 | 917788 |
70 | 971379 | 1819477 | 1277634 | 873022 |
80 | 966276 | 1842248 | 1266523 | 830197 |
90 | 901175 | 1847823 | 1255260 | 736550 |
100 | 803175 | 1865795 | 1241143 | 736756 |
May be it is because of more complex architecture of my server?
-- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
pgsql-hackers by date: