Re: Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL - Mailing list pgsql-performance

From Pietro Pugni
Subject Re: Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL
Date
Msg-id 76EF504C-06AF-471F-8BCD-AFD2B6F85178@gmail.com
Whole thread Raw
In response to Re: Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL  (Aidan Van Dyk <aidan@highrise.ca>)
Responses Re: Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL  (Josh Krupka <jkrupka@gmail.com>)
List pgsql-performance
Hi Aidan,
thank you again for your support.
I found an interesting article showing better performance from a Intel i5 vs a Intel Xeon on different Postgres versions: http://blog.pgaddict.com/posts/performance-since-postgresql-7-4-to-9-4-pgbench

NUMA stands for "Non-Uniform-Memory-Access" .  It's basically the "label" for systems which have memory attached to different cpu sockets, such that accessing all of the memory from a paritciular cpu thread has different costs based on where the actual memory is located (i.e. on some other socket, or the local socket).
Thanks, good to know.

QPI is the the intel "QuickPath Interconnect". It's the communication path between CPU sockets.   Memory ready by one cpu thread that has to come from another cpu socket's memory controller goes through QPI.
Google has lots of info on these, and how they impact performance, etc.
When I’ll get access to BIOS (probably next week or later) I’ll try to disable QPI (if possible). Meanwhile I’ll document on Internet about QPI vs performance.

If you want to see how bad the NUMA/QPI is, play with stream to benchmark memory performance.

With stream you refer to this: https://sites.utexas.edu/jdm4372/tag/stream-benchmark/ ? Do you suggest me some way to do this kind of tests?

Ya, that's the one.  I don't have specific tests in mind.
I’ve done some tests with sysbench on Dell T420 (via apt-get install) and MacMini (I’ve compiled the latest available sources at https://github.com/akopytov/sysbench ).
Here are some results with 16GB RAM read and written at 1MB block size (I don’t know if this makes sense, but I’ve no problem in changing these parameters).

T420 - RAM READ - 16GB / 1MB
sh-4.3# sysbench --test=memory --memory-oper=read --memory-block-size=1MB --memory-total-size=16GB run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Doing memory operations speed test
Memory block size: 1024K

Memory transfer size: 16384M

Memory operations type: read
Memory scope type: global
Threads started!
Done.

Operations performed: 16384 (3643025.32 ops/sec)

16384.00 MB transferred (3643025.32 MB/sec)


Test execution summary:
    total time:                          0.0045s
    total number of events:              16384
    total time taken by event execution: 0.0031
    per-request statistics:
         min:                                  0.00ms
         avg:                                  0.00ms
         max:                                  0.02ms
         approx.  95 percentile:               0.00ms

Threads fairness:
    events (avg/stddev):           16384.0000/0.00
    execution time (avg/stddev):   0.0031/0.00

MacMini - RAM READ - 16GB / 1MB
server:sysbench Pietro$ ./sysbench --test=memory --memory-oper=read --memory-block-size=1MB --memory-total-size=16GB run
sysbench 0.5:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1
Random number generator seed is 0 and will be ignored


Threads started!

Operations performed: 16384 ( 5484.50 ops/sec)

16384.00 MB transferred (5484.50 MB/sec)


General statistics:
    total time:                          2.9873s
    total number of events:              16384
    total time taken by event execution: 2.9836s
    response time:
         min:                                  0.18ms
         avg:                                  0.18ms
         max:                                  0.24ms
         approx.  95 percentile:               0.19ms

Threads fairness:
    events (avg/stddev):           16384.0000/0.00
    execution time (avg/stddev):   2.9836/0.00

T420 - RAM WRITE - 16GB / 1MB
sh-4.3# sysbench --test=memory --memory-oper=write --memory-block-size=1MB --memory-total-size=16GB run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Doing memory operations speed test
Memory block size: 1024K

Memory transfer size: 16384M

Memory operations type: write
Memory scope type: global
Threads started!
Done.

Operations performed: 16384 ( 8298.97 ops/sec)

16384.00 MB transferred (8298.97 MB/sec)


Test execution summary:
    total time:                          1.9742s
    total number of events:              16384
    total time taken by event execution: 1.9723
    per-request statistics:
         min:                                  0.12ms
         avg:                                  0.12ms
         max:                                  0.25ms
         approx.  95 percentile:               0.12ms

Threads fairness:
    events (avg/stddev):           16384.0000/0.00
    execution time (avg/stddev):   1.9723/0.00



MacMini - RAM WRITE - 16GB / 1MB
server:sysbench Pietro$ ./sysbench --test=memory --memory-oper=write --memory-block-size=1MB --memory-total-size=16GB run
sysbench 0.5:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1
Random number generator seed is 0 and will be ignored


Threads started!

Operations performed: 16384 ( 5472.90 ops/sec)

16384.00 MB transferred (5472.90 MB/sec)


General statistics:
    total time:                          2.9937s
    total number of events:              16384
    total time taken by event execution: 2.9890s
    response time:
         min:                                  0.18ms
         avg:                                  0.18ms
         max:                                  0.32ms
         approx.  95 percentile:               0.19ms

Threads fairness:
    events (avg/stddev):           16384.0000/0.00
    execution time (avg/stddev):   2.9890/0.00


T420 - CPU
sh-4.3# sysbench --test=cpu run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 10000


Test execution summary:
    total time:                          13.0683s
    total number of events:              10000
    total time taken by event execution: 13.0674
    per-request statistics:
         min:                                  1.30ms
         avg:                                  1.31ms
         max:                                  1.44ms
         approx.  95 percentile:               1.35ms

Threads fairness:
    events (avg/stddev):           10000.0000/0.00
    execution time (avg/stddev):   13.0674/0.00


MacMini - CPU
server:sysbench Pietro$ ./sysbench --test=cpu run
sysbench 0.5:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1
Random number generator seed is 0 and will be ignored


Primer numbers limit: 10000

Threads started!


General statistics:
    total time:                          11.5728s
    total number of events:              10000
    total time taken by event execution: 11.5703s
    response time:
         min:                                  1.15ms
         avg:                                  1.16ms
         max:                                  2.17ms
         approx.  95 percentile:               1.17ms

Threads fairness:
    events (avg/stddev):           10000.0000/0.00
    execution time (avg/stddev):   11.5703/0.00



A more simple "overview" might be "numactl —hardware”
It returns the following output:

sh-4.3# numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
node 0 size: 64385 MB
node 0 free: 56487 MB
node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31
node 1 size: 64508 MB
node 1 free: 62201 MB
node distances:
node   0   1 
  0:  10  20 
  1:  20  10 

Thank you so much for your help, really appreciate it.
Best regards,
 Pietro



pgsql-performance by date:

Previous
From: Pietro Pugni
Date:
Subject: Re: Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL
Next
From: Josh Krupka
Date:
Subject: Re: Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL