Home > mailing lists

Re: Improving connection scalability: GetSnapshotData() - Mailing list pgsql-hackers

From	Konstantin Knizhnik
Subject	Re: Improving connection scalability: GetSnapshotData()
Date	September 4, 2020 15:24:12
Msg-id	4f245382-2f04-3b2e-ae94-d075d2eb7868@postgrespro.ru Whole thread Raw
In response to	Re: Improving connection scalability: GetSnapshotData() (Michael Paquier <michael@paquier.xyz>)
Responses	Re: Improving connection scalability: GetSnapshotData()
List	pgsql-hackers

Tree view

On 03.09.2020 11:18, Michael Paquier wrote:

On Sun, Aug 16, 2020 at 02:26:57PM -0700, Andres Freund wrote:

So we get some builfarm results while thinking about this.

Andres, there is an entry in the CF for this thread:
https://commitfest.postgresql.org/29/2500/

A lot of work has been committed with 623a9ba, 73487a6, 5788e25, etc.
Now that PGXACT is done, how much work is remaining here?
--
Michael

Andres, 
First of all a lot of thanks for this work.
Improving Postgres connection scalability is very important.

Reported results looks very impressive.
But I tried to reproduce them and didn't observed similar behavior.
So I am wondering what can be the difference and what I am doing wrong.

I have tried two different systems.
First one is IBM Power2 server with 384 cores and 8Tb of RAM.
I run the same read-only pgbench test as you. I do not think that size of the database is matter, so I used scale 100 - 
it seems to be enough to avoid frequent buffer conflicts.
Then I run the same scripts as you:

 for ((n=100; n < 1000; n+=100)); do echo $n; pgbench -M prepared -c $n -T 100 -j $n -M prepared -S -n postgres ;  done
 for ((n=1000; n <= 5000; n+=1000)); do echo $n; pgbench -M prepared -c $n -T 100 -j $n -M prepared -S -n postgres ;  done


I have compared current master with version of Postgres prior to your commits with scalability improvements: a9a4a7ad56

For all number of connections older version shows slightly better results, for example for 500 clients: 475k TPS vs. 450k TPS for current master.

This is quite exotic server and I do not have currently access to it.
So I have repeated experiments at Intel server.
It has 160 cores Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz and 256Gb of RAM.

The same database, the same script, results are the following:

Clients	old/inc	old/exl	new/inc	new/exl
1000	1105750	1163292	1206105	1212701
2000	1050933	1124688	1149706	1164942
3000	1063667	1195158	1118087	1144216
4000	1040065	1290432	1107348	1163906
5000	943813	1258643	1103790	1160251

I have separately show results including/excluding connection connections establishing, 
because in new version there are almost no differences between them, 
but for old version gap between them is noticeable.

Configuration file has the following differences with default postgres config:

max_connections = 10000			# (change requires restart)
shared_buffers = 8GB			# min 128kB


This results contradict with yours and makes me ask the following questions:

1. Why in your case performance is almost two times larger (2 millions vs 1)?
The hardware in my case seems to be at least not worser than yours...
May be there are some other improvements in the version you have tested which are not yet committed to master?

2. You wrote: This is on a machine with 2
Intel(R) Xeon(R) Platinum 8168, but virtualized (2 sockets of 18 cores/36 threads)

According to Intel specification Intel® Xeon® Platinum 8168 Processor has 24 cores:
https://ark.intel.com/content/www/us/en/ark/products/120504/intel-xeon-platinum-8168-processor-33m-cache-2-70-ghz.html

And at your graph we can see almost linear increase of speed up to 40 connections. 

But most suspicious word for me is "virtualized". What is the actual hardware and how it is virtualized?

Do you have any idea why in my case master version (with your commits) behaves almost the same as non-patched version?
Below is yet another table showing scalability from 10 to 100 connections and combining your results (first two columns) and my results (last two columns):

Clients	old master	pgxact-split-cache	current master	revision 9a4a7ad56
10	367883	375682	358984	347067
20	748000	810964	668631	630304
30	999231	1288276	920255	848244
40	991672	1573310	1100745	970717
50	1017561	1715762	1193928	1008755
60	993943	1789698	1255629	917788
70	971379	1819477	1277634	873022
80	966276	1842248	1266523	830197
90	901175	1847823	1255260	736550
100	803175	1865795	1241143	736756

May be it is because of more complex architecture of my server?

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

pgsql-hackers by date:

From: Alvaro Herrera
Date: 04 September 2020, 15:04:46
Subject: Re: [PATCH]Fix ja.po error

From: Kelly Min
Date: 04 September 2020, 15:31:55
Subject: [PATCH] Comments related to " buffer descriptors“ cache line size"

Re: Improving connection scalability: GetSnapshotData() - Mailing list pgsql-hackers

Previous

Next