Home > mailing lists

Thread: Help with bad performance with synchronous commit

Help with bad performance with synchronous commit

From

"Lazaro Garcia"

Date:

31 March 2017, 20:43:10

Hello everyone

I wite because I have a great problem and I have no idea how resolve this:

This is the situation:

I have installed a postgresql server 9.6.2 on Ubuntu server 16.04.2 LTS with this specifications:

12 cores y 30 GB de RAM.

I have created a database with pgbench for testing and the performance are very bad with synchronous commit. These are the results:

scaling factor: 1

query mode: simple

number of clients: 50

number of threads: 12

duration: 30 s

number of transactions actually processed: 2773

latency average = 544.336 ms

latency stddev = 617.941 ms

tps = 91.180927 (including connections establishing)

tps = 91.194080 (excluding connections establishing)

By analyzing the postgres log with pgbadger I could see that the updates take a long time for a table with 10 tuples only. Then I ran a explain analyze and the explain results contradict what the test throws:

Update on pgbench_tellers (cost=4.14..8.16 rows=1 width=358) (actual time=0.021..0.021 rows=0 loops=1)

-> Bitmap Heap Scan on pgbench_tellers (cost=4.14..8.16 rows=1 width=358) (actual time=0.010..0.010 rows=1 loops=1)

Recheck Cond: (tid = 5)

Heap Blocks: exact=1

-> Bitmap Index Scan on pgbench_tellers_pkey (cost=0.00..4.14 rows=1 width=0) (actual time=0.005..0.005 rows=1 loops=1)

Index Cond: (tid = 5)

Planning time: 0.052 ms

Execution time: 0.043 ms

I have runned the same test in other machines with less RAM and CPU and the results are always better than this:

Any help to identify the problem could be very appreciated.

Regards

Re: Help with bad performance with synchronous commit

From

"David G. Johnston"

Date:

31 March 2017, 20:57:01

On Friday, March 31, 2017, Lazaro Garcia <lazaro3487@gmail.com> wrote:

I have runned the same test in other machines with less RAM and CPU and the results are always better than this:

Which leaves questions about the storage device unanswered - and you are using a disk-writing query.

David J.

Re: Help with bad performance with synchronous commit

From

Samed YILDIRIM

Date:

31 March 2017, 22:36:44

Hi Lazaro,

Please initialize your test database with scale factor 50 by using pgbench than perform test again and share results to us?

Best regards.

17:45, 31 Mart 2017, Lazaro Garcia <lazaro3487@gmail.com>:

Hello everyone

I wite because I have a great problem and I have no idea how resolve this:

This is the situation:

I have installed a postgresql server 9.6.2 on Ubuntu server 16.04.2 LTS with this specifications:

12 cores y 30 GB de RAM.

I have created a database with pgbench for testing and the performance are very bad with synchronous commit. These are the results:

scaling factor: 1
query mode: simple
number of clients: 50
number of threads: 12
duration: 30 s
number of transactions actually processed: 2773
latency average = 544.336 ms
latency stddev = 617.941 ms
tps = 91.180927 (including connections establishing)
tps = 91.194080 (excluding connections establishing)

By analyzing the postgres log with pgbadger I could see that the updates take a long time for a table with 10 tuples only. Then I ran a explain analyze and the explain results contradict what the test throws:

Update on pgbench_tellers (cost=4.14..8.16 rows=1 width=358) (actual time=0.021..0.021 rows=0 loops=1)
   -> Bitmap Heap Scan on pgbench_tellers (cost=4.14..8.16 rows=1 width=358) (actual time=0.010..0.010 rows=1 loops=1)
         Recheck Cond: (tid = 5)
         Heap Blocks: exact=1
         -> Bitmap Index Scan on pgbench_tellers_pkey (cost=0.00..4.14 rows=1 width=0) (actual time=0.005..0.005 rows=1 loops=1)
               Index Cond: (tid = 5)
Planning time: 0.052 ms
Execution time: 0.043 ms

I have runned the same test in other machines with less RAM and CPU and the results are always better than this:

Any help to identify the problem could be very appreciated.

Regards

Re: Help with bad performance with synchronous commit

From

"Lazaro Garcia"

Date:

31 March 2017, 23:52:31

After running the test with a scale factor of 100 this is the result:

transaction type: <builtin: TPC-B (sort of)>

scaling factor: 100

query mode: simple

number of clients: 80

number of threads: 12

duration: 60 s

number of transactions actually processed: 13607

latency average = 353.630 ms

latency stddev = 169.859 ms

tps = 225.995896 (including connections establishing)

tps = 226.009386 (excluding connections establishing)

During test execution iostat shows that:

avg-cpu: %user %nice %system %iowait %steal %idle

1.93 0.00 0.59 65.97 0.00 31.52

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util

sda 0.00 0.00 305.00 10.00 2608.00 2112.00 29.97 21.93 73.66 72.83 99.20 3.17 100.00

sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

sda3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

sda4 0.00 0.00 305.00 10.00 2608.00 2112.00 29.97 21.93 73.66 72.83 99.20 3.17 100.00

sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

sdb1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Here the iowait is very higth.

When execute pg_test_fsync:

O_DIRECT supported on this platform for open_datasync and open_sync.

Compare file sync methods using one 8kB write:

(in wal_sync_method preference order, except fdatasync is Linux's default)

open_datasync 101.279 ops/sec 9874 usecs/op

fdatasync 71.367 ops/sec 14012 usecs/op

fsync 20.590 ops/sec 48567 usecs/op

fsync_writethrough n/a

open_sync 26.809 ops/sec 37301 usecs/op

Compare file sync methods using two 8kB writes:

(in wal_sync_method preference order, except fdatasync is Linux's default)

open_datasync 45.375 ops/sec 22038 usecs/op

fdatasync 73.560 ops/sec 13594 usecs/op

fsync 22.099 ops/sec 45251 usecs/op

fsync_writethrough n/a

open_sync 10.872 ops/sec 91981 usecs/op

Compare open_sync with different write sizes:

(This is designed to compare the cost of writing 16kB in different write

open_sync sizes.)

1 * 16kB open_sync write 18.690 ops/sec 53504 usecs/op

2 * 8kB open_sync writes 9.397 ops/sec 106418 usecs/op

4 * 4kB open_sync writes 5.399 ops/sec 185206 usecs/op

8 * 2kB open_sync writes 2.426 ops/sec 412196 usecs/op

16 * 1kB open_sync writes 1.173 ops/sec 852505 usecs/op

Test if fsync on non-write file descriptor is honored:

(If the times are similar, fsync() can sync data written on a different

descriptor.)

write, fsync, close 20.269 ops/sec 49336 usecs/op

write, close, fsync 24.950 ops/sec 40079 usecs/op

Non-sync'ed 8kB writes:

write 498332.902 ops/sec 2 usecs/op

Here the ops/sec are very slow.

Any recommendation.

Regards.

De: Samed YILDIRIM [mailto:samed@reddoc.net]
Enviado el: viernes, 31 de marzo de 2017 12:37 p. m.
Para: Lazaro Garcia; pgsql-admin@postgresql.org
Asunto: Re: [ADMIN] Help with bad performance with synchronous commit

Hi Lazaro,

Please initialize your test database with scale factor 50 by using pgbench than perform test again and share results to us?

Best regards.

17:45, 31 Mart 2017, Lazaro Garcia <lazaro3487@gmail.com>:

Hello everyone

I wite because I have a great problem and I have no idea how resolve this:

This is the situation:

I have installed a postgresql server 9.6.2 on Ubuntu server 16.04.2 LTS with this specifications:

12 cores y 30 GB de RAM.

I have created a database with pgbench for testing and the performance are very bad with synchronous commit. These are the results:

scaling factor: 1
query mode: simple
number of clients: 50
number of threads: 12
duration: 30 s
number of transactions actually processed: 2773
latency average = 544.336 ms
latency stddev = 617.941 ms
tps = 91.180927 (including connections establishing)
tps = 91.194080 (excluding connections establishing)

By analyzing the postgres log with pgbadger I could see that the updates take a long time for a table with 10 tuples only. Then I ran a explain analyze and the explain results contradict what the test throws:

Update on pgbench_tellers (cost=4.14..8.16 rows=1 width=358) (actual time=0.021..0.021 rows=0 loops=1)
   -> Bitmap Heap Scan on pgbench_tellers (cost=4.14..8.16 rows=1 width=358) (actual time=0.010..0.010 rows=1 loops=1)
         Recheck Cond: (tid = 5)
         Heap Blocks: exact=1
         -> Bitmap Index Scan on pgbench_tellers_pkey (cost=0.00..4.14 rows=1 width=0) (actual time=0.005..0.005 rows=1 loops=1)
               Index Cond: (tid = 5)
Planning time: 0.052 ms
Execution time: 0.043 ms

I have runned the same test in other machines with less RAM and CPU and the results are always better than this:

Any help to identify the problem could be very appreciated.

Regards

Re: Help with bad performance with synchronous commit

From

"Lazaro Garcia"

Date:

03 April 2017, 22:54:11

Thanks for the reply:

pgbench -U postgres -p 6432 -j 10 -c 500 -T 60 -h localhost -P 2 pgbench (write test)

I have uninstalled 9.6 and installed postgresql 9.5.6 from Ubuntu repos. These was de new results:

transaction type: TPC-B (sort of)

scaling factor: 200

query mode: simple

number of clients: 500

number of threads: 10

duration: 60 s

number of transactions actually processed: 236733 latency average: 126.187 ms latency stddev: 152.347 ms tps = 3919.425981 (including connections establishing) tps = 3919.945554 (excluding connections establishing)

2 days later the test throws less results but constant.

transaction type: TPC-B (sort of)

scaling factor: 200

query mode: simple

number of clients: 500

number of threads: 10

duration: 60 s

number of transactions actually processed: 76788 latency average: 394.308 ms tps = 1268.044488 (including connections establishing) tps = 1268.124718 (excluding connections establishing)

The iowait was between 15 and 25 percet when the test was runing:

Is there any way to identify the cause of this differences?

How can I configure commit_delay and commit_siblings properly?

These are disk details:

----------------------------------------------------------------------------------------------------------------

From pg_test_fsync. What do you think about that results?

O_DIRECT supported on this platform for open_datasync and open_sync.

Compare file sync methods using one 8kB write:

(in wal_sync_method preference order, except fdatasync is Linux's default)

open_datasync 93.886 ops/sec 10651 usecs/op

fdatasync 79.473 ops/sec 12583 usecs/op

fsync 22.773 ops/sec 43912 usecs/op

fsync_writethrough n/a

open_sync 24.556 ops/sec 40723 usecs/op

Compare file sync methods using two 8kB writes:

(in wal_sync_method preference order, except fdatasync is Linux's default)

open_datasync 47.113 ops/sec 21226 usecs/op

fdatasync 80.205 ops/sec 12468 usecs/op

fsync 13.742 ops/sec 72767 usecs/op

fsync_writethrough n/a

open_sync 12.198 ops/sec 81981 usecs/op

Compare open_sync with different write sizes:

(This is designed to compare the cost of writing 16kB in different write open_sync sizes.)

1 * 16kB open_sync write 22.501 ops/sec 44442 usecs/op

2 * 8kB open_sync writes 10.369 ops/sec 96444 usecs/op

4 * 4kB open_sync writes 5.960 ops/sec 167783 usecs/op

8 * 2kB open_sync writes 2.457 ops/sec 406981 usecs/op

16 * 1kB open_sync writes 1.275 ops/sec 784472 usecs/op

Test if fsync on non-write file descriptor is honored:

(If the times are similar, fsync() can sync data written on a different

descriptor.)

write, fsync, close 25.072 ops/sec 39886 usecs/op

write, close, fsync 23.322 ops/sec 42878 usecs/op

Non-sync'ed 8kB writes:

write 533368.453 ops/sec 2 usecs/op

-------------------------------------------------------------------------------------------------------------

hdparm -I /dev/sda

ATA device, with non-removable media

Model Number: TOSHIBA DT01ACA200

Serial Number: Z4NMKKGAS

Firmware Revision: MX4OABB0

Transport: Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0; Revision: ATA8-AST T13 Project D1697 Revision 0b

Standards:

Used: unknown (minor revision code 0x0029)

Supported: 8 7 6 5

Likely used: 8

Configuration:

Logical max current

cylinders 16383 16383

heads 16 16

sectors/track 63 63

CHS current addressable sectors: 16514064

LBA user addressable sectors: 268435455

LBA48 user addressable sectors: 3907029168

Logical Sector size: 512 bytes

Physical Sector size: 4096 bytes

Logical Sector-0 offset: 0 bytes

device size with M = 1024*1024: 1907729 MBytes

device size with M = 1000*1000: 2000398 MBytes (2000 GB)

cache/buffer size = unknown

Form Factor: 3.5 inch

Nominal Media Rotation Rate: 7200

Capabilities:

LBA, IORDY(can be disabled)

Queue depth: 32

Standby timer values: spec'd by Standard, no device specific minimum

R/W multiple sector transfer: Max = 16 Current = 0

Advanced power management level: disabled

DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6

Cycle time: min=120ns recommended=120ns

PIO: pio0 pio1 pio2 pio3 pio4

Cycle time: no flow control=120ns IORDY flow control=120ns

Commands/features:

Enabled Supported:

* SMART feature set

Security Mode feature set

* Power Management feature set

Write cache

* Look-ahead

* Host Protected Area feature set

* WRITE_BUFFER command

* READ_BUFFER command

* NOP cmd

* DOWNLOAD_MICROCODE

Advanced Power Management feature set

Power-Up In Standby feature set

* SET_FEATURES required to spinup after power up

SET_MAX security extension

* 48-bit Address feature set

* Device Configuration Overlay feature set

* Mandatory FLUSH_CACHE

* FLUSH_CACHE_EXT

* SMART error logging

* SMART self-test

Media Card Pass-Through

* General Purpose Logging feature set

* WRITE_{DMA|MULTIPLE}_FUA_EXT

* 64-bit World wide name

* URG for READ_STREAM[_DMA]_EXT

* URG for WRITE_STREAM[_DMA]_EXT

* WRITE_UNCORRECTABLE_EXT command

* {READ,WRITE}_DMA_EXT_GPL commands

* Segmented DOWNLOAD_MICROCODE

unknown 119[7]

* Gen1 signaling speed (1.5Gb/s)

* Gen2 signaling speed (3.0Gb/s)

* Gen3 signaling speed (6.0Gb/s)

* Native Command Queueing (NCQ)

* Host-initiated interface power management

* Phy event counters

* NCQ priority information

Non-Zero buffer offsets in DMA Setup FIS

* DMA Setup Auto-Activate optimization

Device-initiated interface power management

In-order data delivery

* Software settings preservation

* SMART Command Transport (SCT) feature set

* SCT Write Same (AC2)

* SCT Error Recovery Control (AC3)

* SCT Features Control (AC4)

* SCT Data Tables (AC5)

Security:

Master password revision code = 65534

supported

not enabled

not locked

frozen

not expired: security count

not supported: enhanced erase

320min for SECURITY ERASE UNIT.

Logical Unit WWN Device Identifier: 5000039ffae513fd

NAA : 5

IEEE OUI : 000039

Unique ID : ffae513fd

Checksum: correct

---------------------------------------------------------------------------------------------

parted /dev/sda unit s print

Model: ATA TOSHIBA DT01ACA2 (scsi)

Disk /dev/sda: 3907029168s

Sector size (logical/physical): 512B/4096B Partition Table: gpt Disk Flags:

Number Start End Size File system Name Flags

1 2048s 4095s 2048s bios_grub

2 4096s 1003519s 999424s ext2

3 1003520s 17004543s 16001024s linux-swap(v1)

4 17004544s 3907028991s 3890024448s ext4

I think that results could be better. What do you think?

Thank you very much for your time.

Regards.

De: David G. Johnston [mailto:david.g.johnston@gmail.com]
Enviado el: viernes, 31 de marzo de 2017 10:57 a. m.
Para: Lazaro Garcia
CC: pgsql-admin@postgresql.org
Asunto: Re: [ADMIN] Help with bad performance with synchronous commit

On Friday, March 31, 2017, Lazaro Garcia <lazaro3487@gmail.com> wrote:

I have runned the same test in other machines with less RAM and CPU and the results are always better than this:

Which leaves questions about the storage device unanswered - and you are using a disk-writing query.

David J.