Re: Wait free LW_SHARED acquisition - v0.2 - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Wait free LW_SHARED acquisition - v0.2
Date
Msg-id CAM3SWZRgomgpKwrFE_FAB27qHsk=cahEqVbwyHTua6EVShu3Cg@mail.gmail.com
Whole thread Raw
In response to Re: Wait free LW_SHARED acquisition - v0.2  (Peter Geoghegan <pg@heroku.com>)
List pgsql-hackers
I thought I'd try out what I was in an immediate position to do
without having access to dedicated multi-socket hardware: A benchmark
on AWS. This was a "c3.8xlarge" instance, which are reportedly backed
by Intel Xeon E5-2680 processors. Since the Intel ARK website reports
that these CPUs have 16 "threads" (8 cores + hyperthreading), and
AWS's marketing material indicates that this instance type has 32
"vCPUs", I inferred that the underlying hardware had 2 sockets.
However, reportedly that wasn't the case when procfs was consulted, no
doubt due to Xen Hypervisor voodoo:

ubuntu@ip-10-67-128-2:~$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                32
On-line CPU(s) list:   0-31
Thread(s) per core:    32
Core(s) per socket:    1
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 62
Stepping:              4
CPU MHz:               2800.074
BogoMIPS:              5600.14
Hypervisor vendor:     Xen
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              25600K
NUMA node0 CPU(s):     0-31

I ran the benchmark on Ubuntu 13.10 server, because that seemed to be
the only prominent "enterprise" x86_64 AMI (operating system image)
that came with GCC 4.8 as part its standard toolchain. This exact
setup is benchmarked here:

http://www.phoronix.com/scan.php?page=article&item=amazon_ec2_c3&num=1

(Incidentally, some of the other benchmarks on that site use pgbench
to benchmark the Linux kernel, filesystems, disks and so on. e.g.:
http://www.phoronix.com/scan.php?page=news_item&px=NzI0NQ).

I was hesitant to benchmark using a virtualized system. There is a lot
of contradictory information about the overhead and/or noise added,
which may vary from one workload or hypervisor to the next. But, needs
must when the devil drives, and all that. Besides, this kind of setup
is very commercially relevant these days, so it doesn't seem
unreasonable to see how things work out on an AWS instance that
generally performs well for this workload. Of course, I still want to
replicate the big improvement you reported for multi-socket systems,
but you might have to wait a while for that, unless some kindly
benefactor that has a 4 socket server lying around would like to help
me out.

Results:

http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/c38xlarge-rwlocks/

You can drill down and find the postgresql.conf settings from the
report. There appears to be a modest improvement in transaction
throughput. It's not as large as the improvement you reported for your
2 socket workstation, but it's there, just about.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: [bug fix] pg_ctl always uses the same event source
Next
From: Greg Stark
Date:
Subject: Re: Recovery inconsistencies, standby much larger than primary