Thread: Help with bad performance with synchronous commit

Help with bad performance with synchronous commit

From
"Lazaro Garcia"
Date:

Hello everyone

 

I wite because I have a great problem and I have no idea how resolve this:

 

This is the situation:

 

I have installed a postgresql server 9.6.2 on Ubuntu server 16.04.2 LTS with this specifications:

 

12 cores y 30 GB de RAM.

 

I have created a database with pgbench for testing and the performance are very bad with  synchronous commit. These are the results:

 

scaling factor: 1

query mode: simple

number of clients: 50

number of threads: 12

duration: 30 s

number of transactions actually processed: 2773

latency average = 544.336 ms

latency stddev = 617.941 ms

tps = 91.180927 (including connections establishing)

tps = 91.194080 (excluding connections establishing)

 

By analyzing the postgres log with pgbadger I could see that the updates take a long time for a table with 10 tuples only. Then I ran a explain analyze and the explain results contradict what the test throws:

 

Update on pgbench_tellers  (cost=4.14..8.16 rows=1 width=358) (actual time=0.021..0.021 rows=0 loops=1)

   ->  Bitmap Heap Scan on pgbench_tellers  (cost=4.14..8.16 rows=1 width=358) (actual time=0.010..0.010 rows=1 loops=1)

         Recheck Cond: (tid = 5)

         Heap Blocks: exact=1

         ->  Bitmap Index Scan on pgbench_tellers_pkey  (cost=0.00..4.14 rows=1 width=0) (actual time=0.005..0.005 rows=1 loops=1)

               Index Cond: (tid = 5)

Planning time: 0.052 ms

Execution time: 0.043 ms

 

I have runned the same test in other machines with less RAM and CPU and the results are always better than this:

 

Any help to identify the problem could be very appreciated.

 

Regards

Re: Help with bad performance with synchronous commit

From
"David G. Johnston"
Date:
On Friday, March 31, 2017, Lazaro Garcia <lazaro3487@gmail.com> wrote:

I have runned the same test in other machines with less RAM and CPU and the results are always better than this:

Which leaves questions about the storage device unanswered - and you are using a disk-writing query.

David J.

Re: Help with bad performance with synchronous commit

From
Samed YILDIRIM
Date:
Hi Lazaro,

Please initialize your test database with scale factor 50 by using pgbench than perform test again and share results to us?

Best regards.

17:45, 31 Mart 2017, Lazaro Garcia <lazaro3487@gmail.com>:

Hello everyone

 

I wite because I have a great problem and I have no idea how resolve this:

 

This is the situation:

 

I have installed a postgresql server 9.6.2 on Ubuntu server 16.04.2 LTS with this specifications:

 

12 cores y 30 GB de RAM.

 

I have created a database with pgbench for testing and the performance are very bad with  synchronous commit. These are the results:

 

scaling factor: 1

query mode: simple

number of clients: 50

number of threads: 12

duration: 30 s

number of transactions actually processed: 2773

latency average = 544.336 ms

latency stddev = 617.941 ms

tps = 91.180927 (including connections establishing)

tps = 91.194080 (excluding connections establishing)

 

By analyzing the postgres log with pgbadger I could see that the updates take a long time for a table with 10 tuples only. Then I ran a explain analyze and the explain results contradict what the test throws:

 

Update on pgbench_tellers  (cost=4.14..8.16 rows=1 width=358) (actual time=0.021..0.021 rows=0 loops=1)

   ->  Bitmap Heap Scan on pgbench_tellers  (cost=4.14..8.16 rows=1 width=358) (actual time=0.010..0.010 rows=1 loops=1)

         Recheck Cond: (tid = 5)

         Heap Blocks: exact=1

         ->  Bitmap Index Scan on pgbench_tellers_pkey  (cost=0.00..4.14 rows=1 width=0) (actual time=0.005..0.005 rows=1 loops=1)

               Index Cond: (tid = 5)

Planning time: 0.052 ms

Execution time: 0.043 ms

 

I have runned the same test in other machines with less RAM and CPU and the results are always better than this:

 

Any help to identify the problem could be very appreciated.

 

Regards

Re: Help with bad performance with synchronous commit

From
"Lazaro Garcia"
Date:

After running the test with a scale factor of 100 this is the result:

 

transaction type: <builtin: TPC-B (sort of)>

scaling factor: 100

query mode: simple

number of clients: 80

number of threads: 12

duration: 60 s

number of transactions actually processed: 13607

latency average = 353.630 ms

latency stddev = 169.859 ms

tps = 225.995896 (including connections establishing)

tps = 226.009386 (excluding connections establishing)

 

During test execution iostat shows that:

 

avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           1.93    0.00    0.59   65.97    0.00   31.52

 

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util

sda               0.00     0.00  305.00   10.00  2608.00  2112.00    29.97    21.93   73.66   72.83   99.20   3.17 100.00

sda1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

sda2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

sda3              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

sda4              0.00     0.00  305.00   10.00  2608.00  2112.00    29.97    21.93   73.66   72.83   99.20   3.17 100.00

sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

sdb1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

 

Here the iowait is very higth.

 

When execute pg_test_fsync:

 

O_DIRECT supported on this platform for open_datasync and open_sync.

 

Compare file sync methods using one 8kB write:

(in wal_sync_method preference order, except fdatasync is Linux's default)

        open_datasync                       101.279 ops/sec    9874 usecs/op

        fdatasync                            71.367 ops/sec   14012 usecs/op

        fsync                                20.590 ops/sec   48567 usecs/op

        fsync_writethrough                            n/a

        open_sync                            26.809 ops/sec   37301 usecs/op

 

Compare file sync methods using two 8kB writes:

(in wal_sync_method preference order, except fdatasync is Linux's default)

        open_datasync                        45.375 ops/sec   22038 usecs/op

        fdatasync                            73.560 ops/sec   13594 usecs/op

        fsync                                22.099 ops/sec   45251 usecs/op

        fsync_writethrough                            n/a

        open_sync                            10.872 ops/sec   91981 usecs/op

 

Compare open_sync with different write sizes:

(This is designed to compare the cost of writing 16kB in different write

open_sync sizes.)

         1 * 16kB open_sync write            18.690 ops/sec   53504 usecs/op

         2 *  8kB open_sync writes            9.397 ops/sec  106418 usecs/op

         4 *  4kB open_sync writes            5.399 ops/sec  185206 usecs/op

         8 *  2kB open_sync writes            2.426 ops/sec  412196 usecs/op

        16 *  1kB open_sync writes            1.173 ops/sec  852505 usecs/op

 

Test if fsync on non-write file descriptor is honored:

(If the times are similar, fsync() can sync data written on a different

descriptor.)

        write, fsync, close                  20.269 ops/sec   49336 usecs/op

        write, close, fsync                  24.950 ops/sec   40079 usecs/op

 

Non-sync'ed 8kB writes:

        write                            498332.902 ops/sec       2 usecs/op

 

Here the ops/sec   are very slow.

 

Any recommendation.

 

Regards.

 

 

De: Samed YILDIRIM [mailto:samed@reddoc.net]
Enviado el: viernes, 31 de marzo de 2017 12:37 p. m.
Para: Lazaro Garcia; pgsql-admin@postgresql.org
Asunto: Re: [ADMIN] Help with bad performance with synchronous commit

 

Hi Lazaro,

 

Please initialize your test database with scale factor 50 by using pgbench than perform test again and share results to us?

 

Best regards.

17:45, 31 Mart 2017, Lazaro Garcia <lazaro3487@gmail.com>:

Hello everyone

 

I wite because I have a great problem and I have no idea how resolve this:

 

This is the situation:

 

I have installed a postgresql server 9.6.2 on Ubuntu server 16.04.2 LTS with this specifications:

 

12 cores y 30 GB de RAM.

 

I have created a database with pgbench for testing and the performance are very bad with  synchronous commit. These are the results:

 

scaling factor: 1

query mode: simple

number of clients: 50

number of threads: 12

duration: 30 s

number of transactions actually processed: 2773

latency average = 544.336 ms

latency stddev = 617.941 ms

tps = 91.180927 (including connections establishing)

tps = 91.194080 (excluding connections establishing)

 

By analyzing the postgres log with pgbadger I could see that the updates take a long time for a table with 10 tuples only. Then I ran a explain analyze and the explain results contradict what the test throws:

 

Update on pgbench_tellers  (cost=4.14..8.16 rows=1 width=358) (actual time=0.021..0.021 rows=0 loops=1)

   ->  Bitmap Heap Scan on pgbench_tellers  (cost=4.14..8.16 rows=1 width=358) (actual time=0.010..0.010 rows=1 loops=1)

         Recheck Cond: (tid = 5)

         Heap Blocks: exact=1

         ->  Bitmap Index Scan on pgbench_tellers_pkey  (cost=0.00..4.14 rows=1 width=0) (actual time=0.005..0.005 rows=1 loops=1)

               Index Cond: (tid = 5)

Planning time: 0.052 ms

Execution time: 0.043 ms

 

I have runned the same test in other machines with less RAM and CPU and the results are always better than this:

 

Any help to identify the problem could be very appreciated.

 

Regards

Re: Help with bad performance with synchronous commit

From
"Lazaro Garcia"
Date:

Thanks for the reply:

 

pgbench -U postgres -p 6432 -j 10 -c 500 -T 60 -h localhost -P 2  pgbench (write test)

 

I have uninstalled 9.6 and installed postgresql 9.5.6 from Ubuntu repos. These was de new results:

 

transaction type: TPC-B (sort of)

scaling factor: 200

query mode: simple

number of clients: 500

number of threads: 10

duration: 60 s

number of transactions actually processed: 236733 latency average: 126.187 ms latency stddev: 152.347 ms tps = 3919.425981 (including connections establishing) tps = 3919.945554 (excluding connections establishing)

 

2 days later the test throws less results but constant.

 

transaction type: TPC-B (sort of)

scaling factor: 200

query mode: simple

number of clients: 500

number of threads: 10

duration: 60 s

number of transactions actually processed: 76788 latency average: 394.308 ms tps = 1268.044488 (including connections establishing) tps = 1268.124718 (excluding connections establishing)

 

The iowait was  between 15 and  25 percet when the test was runing:  

 

Is there any way to identify the cause of this differences?

 

How can I configure commit_delay and commit_siblings properly?

 

These are disk details:

 

----------------------------------------------------------------------------------------------------------------

 

From pg_test_fsync. What do you think about that results?

 

O_DIRECT supported on this platform for open_datasync and open_sync.

 

Compare file sync methods using one 8kB write:

(in wal_sync_method preference order, except fdatasync is Linux's default)

        open_datasync                        93.886 ops/sec   10651 usecs/op

        fdatasync                            79.473 ops/sec   12583 usecs/op

        fsync                                22.773 ops/sec   43912 usecs/op

        fsync_writethrough                            n/a

        open_sync                            24.556 ops/sec   40723 usecs/op

 

Compare file sync methods using two 8kB writes:

(in wal_sync_method preference order, except fdatasync is Linux's default)

        open_datasync                        47.113 ops/sec   21226 usecs/op

        fdatasync                            80.205 ops/sec   12468 usecs/op

        fsync                                13.742 ops/sec   72767 usecs/op

        fsync_writethrough                            n/a

        open_sync                            12.198 ops/sec   81981 usecs/op

 

Compare open_sync with different write sizes:

(This is designed to compare the cost of writing 16kB in different write open_sync sizes.)

         1 * 16kB open_sync write            22.501 ops/sec   44442 usecs/op

         2 *  8kB open_sync writes           10.369 ops/sec   96444 usecs/op

         4 *  4kB open_sync writes            5.960 ops/sec  167783 usecs/op

         8 *  2kB open_sync writes            2.457 ops/sec  406981 usecs/op

        16 *  1kB open_sync writes            1.275 ops/sec  784472 usecs/op

 

Test if fsync on non-write file descriptor is honored:

(If the times are similar, fsync() can sync data written on a different

descriptor.)

        write, fsync, close                  25.072 ops/sec   39886 usecs/op

        write, close, fsync                  23.322 ops/sec   42878 usecs/op

 

Non-sync'ed 8kB writes:

        write                            533368.453 ops/sec       2 usecs/op

 

-------------------------------------------------------------------------------------------------------------

 

hdparm -I /dev/sda

 

ATA device, with non-removable media

        Model Number:       TOSHIBA DT01ACA200

        Serial Number:      Z4NMKKGAS

        Firmware Revision:  MX4OABB0

        Transport:          Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0; Revision: ATA8-AST T13 Project D1697 Revision 0b

Standards:

        Used: unknown (minor revision code 0x0029)

        Supported: 8 7 6 5

        Likely used: 8

Configuration:

        Logical         max     current

        cylinders       16383   16383

        heads           16      16

        sectors/track   63      63

        --

        CHS current addressable sectors:   16514064

        LBA    user addressable sectors:  268435455

        LBA48  user addressable sectors: 3907029168

        Logical  Sector size:                   512 bytes

        Physical Sector size:                  4096 bytes

        Logical Sector-0 offset:                  0 bytes

        device size with M = 1024*1024:     1907729 MBytes

        device size with M = 1000*1000:     2000398 MBytes (2000 GB)

        cache/buffer size  = unknown

        Form Factor: 3.5 inch

        Nominal Media Rotation Rate: 7200

Capabilities:

        LBA, IORDY(can be disabled)

        Queue depth: 32

        Standby timer values: spec'd by Standard, no device specific minimum

        R/W multiple sector transfer: Max = 16  Current = 0

        Advanced power management level: disabled

        DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6

             Cycle time: min=120ns recommended=120ns

        PIO: pio0 pio1 pio2 pio3 pio4

             Cycle time: no flow control=120ns  IORDY flow control=120ns

Commands/features:

        Enabled Supported:

           *    SMART feature set

                Security Mode feature set

           *    Power Management feature set

                Write cache

           *    Look-ahead

           *    Host Protected Area feature set

           *    WRITE_BUFFER command

           *    READ_BUFFER command

           *    NOP cmd

           *    DOWNLOAD_MICROCODE

                Advanced Power Management feature set

                Power-Up In Standby feature set

           *    SET_FEATURES required to spinup after power up

                SET_MAX security extension

           *    48-bit Address feature set

           *    Device Configuration Overlay feature set

           *    Mandatory FLUSH_CACHE

           *    FLUSH_CACHE_EXT

           *    SMART error logging

           *    SMART self-test

                Media Card Pass-Through

           *    General Purpose Logging feature set

           *    WRITE_{DMA|MULTIPLE}_FUA_EXT

           *    64-bit World wide name

           *    URG for READ_STREAM[_DMA]_EXT

           *    URG for WRITE_STREAM[_DMA]_EXT

           *    WRITE_UNCORRECTABLE_EXT command

           *    {READ,WRITE}_DMA_EXT_GPL commands

           *    Segmented DOWNLOAD_MICROCODE

                unknown 119[7]

           *    Gen1 signaling speed (1.5Gb/s)

           *    Gen2 signaling speed (3.0Gb/s)

           *    Gen3 signaling speed (6.0Gb/s)

           *    Native Command Queueing (NCQ)

           *    Host-initiated interface power management

           *    Phy event counters

           *    NCQ priority information

                Non-Zero buffer offsets in DMA Setup FIS

           *    DMA Setup Auto-Activate optimization

                Device-initiated interface power management

                In-order data delivery

           *    Software settings preservation

           *    SMART Command Transport (SCT) feature set

           *    SCT Write Same (AC2)

           *    SCT Error Recovery Control (AC3)

           *    SCT Features Control (AC4)

           *    SCT Data Tables (AC5)

Security:

        Master password revision code = 65534

                supported

        not     enabled

        not     locked

                frozen

       not     expired: security count

        not     supported: enhanced erase

        320min for SECURITY ERASE UNIT.

Logical Unit WWN Device Identifier: 5000039ffae513fd

        NAA             : 5

        IEEE OUI        : 000039

        Unique ID       : ffae513fd

Checksum: correct

 

---------------------------------------------------------------------------------------------

parted /dev/sda unit s print

 

Model: ATA TOSHIBA DT01ACA2 (scsi)

Disk /dev/sda: 3907029168s

Sector size (logical/physical): 512B/4096B Partition Table: gpt Disk Flags:

 

Number  Start      End          Size         File system     Name  Flags

1      2048s      4095s        2048s                              bios_grub

2      4096s      1003519s     999424s      ext2

3      1003520s   17004543s    16001024s    linux-swap(v1)

4      17004544s  3907028991s  3890024448s  ext4

 

I think that results could be better. What do you think?

 

Thank you very much for your time.

 

Regards.

 

 

De: David G. Johnston [mailto:david.g.johnston@gmail.com]
Enviado el: viernes, 31 de marzo de 2017 10:57 a. m.
Para: Lazaro Garcia
CC: pgsql-admin@postgresql.org
Asunto: Re: [ADMIN] Help with bad performance with synchronous commit

 

On Friday, March 31, 2017, Lazaro Garcia <lazaro3487@gmail.com> wrote:

I have runned the same test in other machines with less RAM and CPU and the results are always better than this:

Which leaves questions about the storage device unanswered - and you are using a disk-writing query.

 

David J.