Re: Improving spin-lock implementation on ARM. - Mailing list pgsql-hackers

From Amit Khandekar
Subject Re: Improving spin-lock implementation on ARM.
Date
Msg-id CAJ3gD9eO3P-W+skRQHbyUQDytiz7gkS8b8PknwkDi0uvRw7fGA@mail.gmail.com
Whole thread Raw
In response to Re: Improving spin-lock implementation on ARM.  (Krunal Bauskar <krunalbauskar@gmail.com>)
List pgsql-hackers
On Thu, 26 Nov 2020 at 10:55, Krunal Bauskar <krunalbauskar@gmail.com> wrote:
> Hardware: ARM Kunpeng 920 BareMetal Server 2.6 GHz. 64 cores (56 cores for server and 8 for client) [2 numa nodes]
> Storage: 3.2 TB NVMe SSD
> OS: CentOS Linux release 7.6
> PGSQL: baseline = Release Tag 13.1
> Invocation suite: https://github.com/mysqlonarm/benchmark-suites/tree/master/pgsql-pbench (Uses pgbench)

Using the same hardware, attached are my improvement figures, which
are pretty much in line with your figures. Except that, I did not run
for more than 400 number of clients. And, I am getting some
improvement even for select-only workloads, in case of 200-400
clients. For read-write load,  I had seen that the s_lock() contention
was caused when the XLogFlush() uses the spinlock. But for read-only
case, I have not analyzed where the improvement occurred.

The .png files in the attached tar have the graphs for head versus patch.

The GUCs that I changed :

work_mem=64MB
shared_buffers=128GB
maintenance_work_mem = 1GB
min_wal_size = 20GB
max_wal_size = 100GB
checkpoint_timeout = 60min
checkpoint_completion_target = 0.9
full_page_writes = on
synchronous_commit = on
effective_io_concurrency = 200
log_checkpoints = on

For backends, 64 CPUs were allotted (covering 2 NUMA nodes) , and for
pgbench clients a separate set of 28 CPUs were allotted on a different
socket. Server was pre_warmed().

Attachment

pgsql-hackers by date:

Previous
From: "Hou, Zhijie"
Date:
Subject: RE: Parallel Inserts in CREATE TABLE AS
Next
From: Luc Vlaming
Date:
Subject: Re: Multi Inserts in CREATE TABLE AS - revived patch