Re: 60 core performance with 9.3 - Mailing list pgsql-performance

From Mark Kirkwood
Subject Re: 60 core performance with 9.3
Date
Msg-id 53D84E16.90609@catalyst.net.nz
Whole thread Raw
In response to Re: 60 core performance with 9.3  (Mark Kirkwood <mark.kirkwood@catalyst.net.nz>)
Responses Re: 60 core performance with 9.3  ("Tomas Vondra" <tv@fuzzy.cz>)
Re: 60 core performance with 9.3  (Matt Clarkson <mattc@catalyst.net.nz>)
List pgsql-performance
On 17/07/14 11:58, Mark Kirkwood wrote:

>
> Trying out with numa_balancing=0 seemed to get essentially the same
> performance. Similarly wrapping postgres startup with --interleave.
>
> All this made me want to try with numa *really* disabled. So rebooted
> the box with "numa=off" appended to the kernel cmdline. Somewhat
> surprisingly (to me anyway), the numbers were essentially identical. The
> profile, however is quite different:
>

A little more tweaking got some further improvement:

rwlocks patch as before

wal_buffers = 256MB
checkpoint_segments = 1920
wal_sync_method = open_datasync

LSI RAID adaptor disable read ahead and write cache for SSD fast path mode
numa_balancing = 0


Pgbench scale 2000 again:

clients  | tps (prev) |  tps (tweaked config)
---------+------------+---------
6        |   8175     |   8281
12       |  14409     |  15896
24       |  17191     |  19522
48       |  23122     |  29776
96       |  22308     |  32352
192      |  23109     |  28804


Now recall we were seeing no actual tps changes with numa_balancing=0 or
1 (so the improvement above is from the other changes), but figured it
might be informative to try to track down what the non-numa bottlenecks
looked like. We tried profiling the entire 10 minute run which showed
the stats collector as a possible source of contention:


      3.86%        postgres  [kernel.kallsyms]        [k] _raw_spin_lock_bh
                   |
                   --- _raw_spin_lock_bh
                      |
                      |--95.78%-- lock_sock_nested
                      |          udpv6_sendmsg
                      |          inet_sendmsg
                      |          sock_sendmsg
                      |          SYSC_sendto
                      |          sys_sendto
                      |          tracesys
                      |          __libc_send
                      |          |
                      |          |--99.17%-- pgstat_report_stat
                      |          |          PostgresMain
                      |          |          ServerLoop
                      |          |          PostmasterMain
                      |          |          main
                      |          |          __libc_start_main
                      |          |
                      |          |--0.77%-- pgstat_send_bgwriter
                      |          |          BackgroundWriterMain
                      |          |          AuxiliaryProcessMain
                      |          |          0x7f08efe8d453
                      |          |          reaper
                      |          |          __restore_rt
                      |          |          PostmasterMain
                      |          |          main
                      |          |          __libc_start_main
                      |           --0.07%-- [...]
                      |
                      |--2.54%-- __lock_sock
                      |          |
                      |          |--91.95%-- lock_sock_nested
                      |          |          udpv6_sendmsg
                      |          |          inet_sendmsg
                      |          |          sock_sendmsg
                      |          |          SYSC_sendto
                      |          |          sys_sendto
                      |          |          tracesys
                      |          |          __libc_send
                      |          |          |
                      |          |          |--99.73%-- pgstat_report_stat
                      |          |          |          PostgresMain
                      |          |          |          ServerLoop



Disabling track_counts and rerunning pgbench:

clients  | tps (no counts)
---------+------------
6        |    9806
12       |   18000
24       |   29281
48       |   43703
96       |   54539
192      |   36114


While these numbers look great in the middle range (12-96 clients), then
benefit looks to be tailing off as client numbers increase. Also running
with no stats (and hence no auto vacuum or analyze) is way too scary!

Trying out less write heavy workloads shows that the stats overhead does
not appear to be significant for *read* heavy cases, so this result
above is perhaps more of a curiosity than anything (given that read
heavy is more typical...and our real workload is more similar to read
heavy).

The profile for counts off looks like:

      4.79%         swapper  [kernel.kallsyms]        [k] read_hpet
                    |
                    --- read_hpet
                       |
                       |--97.10%-- ktime_get
                       |          |
                       |          |--35.24%-- clockevents_program_event
                       |          |          tick_program_event
                       |          |          |
                       |          |          |--56.59%--
__hrtimer_start_range_ns
                       |          |          |          |
                       |          |          |          |--78.12%--
hrtimer_start_range_ns
                       |          |          |          |
tick_nohz_restart
                       |          |          |          |
tick_nohz_idle_exit
                       |          |          |          |
cpu_startup_entry
                       |          |          |          |          |
                       |          |          |          |
|--98.84%-- start_secondary
                       |          |          |          |          |
                       |          |          |          |
--1.16%-- rest_init
                       |          |          |          |
       start_kernel
                       |          |          |          |
       x86_64_start_reservations
                       |          |          |          |
       x86_64_start_kernel
                       |          |          |          |
                       |          |          |           --21.88%--
hrtimer_start
                       |          |          |
tick_nohz_stop_sched_tick
                       |          |          |
__tick_nohz_idle_enter
                       |          |          |                     |
                       |          |          |
|--99.89%-- tick_nohz_idle_enter
                       |          |          |                     |
       cpu_startup_entry
                       |          |          |                     |
       |
                       |          |          |                     |
       |--98.30%-- start_secondary
                       |          |          |                     |
       |
                       |          |          |                     |
        --1.70%-- rest_init
                       |          |          |                     |
                  start_kernel
                       |          |          |                     |
                  x86_64_start_reservations
                       |          |          |                     |
                  x86_64_start_kernel
                       |          |          |
--0.11%-- [...]
                       |          |          |
                       |          |          |--40.25%--
hrtimer_force_reprogram
                       |          |          |          __remove_hrtimer
                       |          |          |          |
                       |          |          |          |--89.68%--
__hrtimer_start_range_ns
                       |          |          |          |
hrtimer_start
                       |          |          |          |
tick_nohz_stop_sched_tick
                       |          |          |          |
__tick_nohz_idle_enter
                       |          |          |          |          |
                       |          |          |          |
|--99.90%-- tick_nohz_idle_enter
                       |          |          |          |          |
       cpu_startup_entry
                       |          |          |          |          |
       |
                       |          |          |          |          |
       |--99.04%-- start_secondary
                       |          |          |          |          |
       |
                       |          |          |          |          |
        --0.96%-- rest_init
                       |          |          |          |          |
                  start_kernel
                       |          |          |          |          |
                  x86_64_start_reservations
                       |          |          |          |          |
                  x86_64_start_kernel
                       |          |          |          |
--0.10%-- [...]
                       |          |          |          |



Any thoughts on how to proceed further appreciated!

Cheers,

Mark


pgsql-performance by date:

Previous
From: Rural Hunter
Date:
Subject: Re: Very slow planning performance on partition table
Next
From: Josh Berkus
Date:
Subject: Why you should turn on Checksums with SSDs