Re: Performances issues with SSD volume ? - Mailing list pgsql-admin

From Glyn Astill
Subject Re: Performances issues with SSD volume ?
Date
Msg-id 1247264598.4844463.1432140629700.JavaMail.yahoo@mail.yahoo.com
Whole thread Raw
In response to Re: Performances issues with SSD volume ?  (Thomas SIMON <tsimon@neteven.com>)
Responses Re: Performances issues with SSD volume ?
Re: Performances issues with SSD volume ?
List pgsql-admin

> From: Thomas SIMON <tsimon@neteven.com>
> To: glynastill@yahoo.co.uk
> Cc: "pgsql-admin@postgresql.org" <pgsql-admin@postgresql.org>
> Sent: Wednesday, 20 May 2015, 16:41
> Subject: Re: [ADMIN] Performances issues with SSD volume ?
>
> Hi Glyn,
>
> I'll try to answer this points.
>
> I've made some benchs, and indeed 3.2 not helping. not helping at all.
> I changed to 3.14 and gap is quite big !
> With pgbench RW test, 3.2 --> 4200 TPS ; 3.14 --> 6900 TPS in same
> conditions
> With pgbench RO test, 3.2 --> 37000 TPS ; 3.14 --> 95000 TPS, same
> conditions too.

>


That's a start then.

> It should so be better, but when server was in production, and ever with
> bad kernel, performances was already quite good before they quickly
> decreased.
> So i think too I have another configuration problem.
>
> You say you're IO bound, so some output from sar / iostat / dstat and
> pg_stat_activity etc before and during the issue would be of use.
>
> -> My server is not in production right now, so it is difficult to
> replay production load and have some useful metrics.
> The best way I've found is to replay trafic from logs with pgreplay.
> I hoped that the server falls back by replaying this traffic, but it
> never happens ... Another thing I can't understand ...
>
> Below is my dstat output when I replay this traffic (and so when server
> runs normally)
> I have unfortunately no more outputs when server's performances decreased.

>

It's a shame we can't get any insight into activity on the server during the issues.
>
>
> Other things you asked
>
>      System memory size : 256 Go
>      SSD Model numbers and how many : 4 SSd disks ; RAID 10 ; model
> INTEL SSDSC2BB480G4
>      Raid controller : MegaRAID SAS 2208
>      Partition alignments and stripe sizes : see fdisk delow
>      Kernel options : the config file is here :
> ftp://ftp.ovh.net/made-in-ovh/bzImage/3.14.43/config-3.14.43-xxxx-std-ipv6-64
>      Filesystem used and mount options : ext4, see mtab below
>      IO Scheduler : noop [deadline] cfq for my ssd raid volume
>      Postgresql version and configuration : 9.3.5
>
> max_connections=1800
> shared_buffers=8GB
> temp_buffers=32MB
> work_mem=100MB
> maintenance_work_mem=12GB
> bgwriter_lru_maxpages=200
> effective_io_concurrency=4
> wal_level=hot_standby
> wal_sync_method=fdatasync
> wal_writer_delay=2000ms
> commit_delay=1000
> checkpoint_segments=80
> checkpoint_timeout=15min
> checkpoint_completion_target=0.7
> archive_command='rsync ....'
> max_wal_senders=10
> wal_keep_segments=38600
> vacuum_defer_cleanup_age=100
> hot_standby = on
> max_standby_archive_delay = 5min
> max_standby_streaming_delay = 5min
> hot_standby_feedback = on
> random_page_cost = 1.0
> effective_cache_size = 240GB
> log_min_error_statement = warning
> log_min_duration_statement = 0
> log_checkpoints = on
> log_connections = on
> log_disconnections = on
> log_line_prefix = '%m|%u|%d|%c|'
> log_lock_waits = on
> log_statement = 'all'
> log_timezone = 'localtime'
> track_activities = on
> track_functions = pl
> track_activity_query_size = 8192
> autovacuum_max_workers = 5
> autovacuum_naptime = 30s
> autovacuum_vacuum_threshold = 40
> autovacuum_analyze_threshold = 20
> autovacuum_vacuum_scale_factor = 0.10
> autovacuum_analyze_scale_factor = 0.10
> autovacuum_vacuum_cost_delay = 5ms
> default_transaction_isolation = 'read committed'
> max_locks_per_transaction = 128
>
>
>
>      Connection pool sizing (pgpool2)
> num_init_children = 1790
> max_pool = 1

>


1800 is quite a lot of connections, and with max_pool=1 in pgpool you're effectively just using pgpool as a proxy (as I
recall,my memory is a little fuzzy on pgpool now).  Unless your app is stateful in some way or has unique users for
eachof those 1800 connections you should lower the quantity of active connections.  A general starting point is usually
cpucores * 2, so you could up max_pool and divide num_init_children by the same amount. 

Hard to say what you need to do without knowing what exactly you're doing though.  What's the nature of the app(s)?

> I also add megacli parameters :
>
> Virtual Drive: 2 (Target Id: 2)
> Name                :datassd
> RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
> Size                : 893.25 GB
> Sector Size         : 512
> Is VD emulated      : Yes
> Mirror Data         : 893.25 GB
> State               : Optimal
> Strip Size          : 256 KB
> Number Of Drives per span:2
> Span Depth          : 2
> Default Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write
> Cache if Bad BBU
> Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write
> Cache if Bad BBU
> Default Access Policy: Read/Write
> Current Access Policy: Read/Write
> Disk Cache Policy   : Enabled
> Encryption Type     : None
> Bad Blocks Exist: No
> PI type: No PI
>
> Is VD Cached: No

>


Not using your raid controllers write cache then?  Not sure just how important that is with SSDs these days, but if
you'vegot a BBU set it to "WriteBack". Also change "Cache if Bad BBU" to "No Write Cache if Bad BBU" if you do that. 


>
> Other outputs :
>      fdisk -l
>
> Disk /dev/sdc: 959.1 GB, 959119884288 bytes
> 255 heads, 63 sectors/track, 116606 cylinders, total 1873281024 sectors
> Units = sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 4096 bytes
> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
> Disk identifier: 0x00000000
>
> Disk /dev/mapper/vg_datassd-lv_datassd: 751.6 GB, 751619276800 bytes
> 255 heads, 63 sectors/track, 91379 cylinders, total 1468006400 sectors
> Units = sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 4096 bytes
> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
> Disk identifier: 0x00000000
>
>
>      cat /etc/mtab
> /dev/mapper/vg_datassd-lv_datassd /datassd ext4
> rw,relatime,discard,nobarrier,data=ordered 0 0
> (I added nobarrier option)
>
>
>      cat /sys/block/sdc/queue/scheduler
> noop [deadline] cfq
>

>


You could swap relatime for noatime,nodiratime.

>      sysctl kernel | grep sched
> kernel.sched_child_runs_first = 0
> kernel.sched_rr_timeslice_ms = 25
> kernel.sched_rt_period_us = 1000000
> kernel.sched_rt_runtime_us = 950000
>
> I've read some advices about changing kernel.sched_autogroup_enabled=0
> and kernel.sched_migration_cost_ns=5000000, but this parameters are not
> recognized by my kernel. So I don't know what to do with that...

>

sched_migration_cost_ns would be called sched_migration_cost in your old 3.2 kernel, not sure why
sched_autogroup_enabled wouldn't be recognized though.


>      sysctl vm
> vm.admin_reserve_kbytes = 8192
> vm.block_dump = 0
> vm.dirty_background_bytes = 8388608
> vm.dirty_background_ratio = 0
> vm.dirty_bytes = 67108864
> vm.dirty_expire_centisecs = 3000
> vm.dirty_ratio = 0
> vm.dirty_writeback_centisecs = 500
> vm.drop_caches = 3
> vm.laptop_mode = 0
> vm.legacy_va_layout = 0
> vm.lowmem_reserve_ratio = 256    256    32
> vm.max_map_count = 65530
> vm.memory_failure_early_kill = 0
> vm.memory_failure_recovery = 1
> vm.min_free_kbytes = 65008
> vm.min_slab_ratio = 5
> vm.min_unmapped_ratio = 1
> vm.mmap_min_addr = 4096
> vm.nr_pdflush_threads = 0
> vm.numa_zonelist_order = default
> vm.oom_dump_tasks = 1
> vm.oom_kill_allocating_task = 0
> vm.overcommit_kbytes = 0
> vm.overcommit_memory = 2
> vm.overcommit_ratio = 50
> vm.page-cluster = 3
> vm.panic_on_oom = 0
> vm.percpu_pagelist_fraction = 0
> vm.scan_unevictable_pages = 0
> vm.stat_interval = 1
> vm.swappiness = 60
> vm.user_reserve_kbytes = 131072
> vm.vfs_cache_pressure = 100
> vm.zone_reclaim_mode = 0
>
>
>      select * from pg_stat_activity
> I've got hundred of entries for that when i'm in production, and I
> can't
> paste it here due to confidentiality.
> By day, it is around 50 millions queries usually. (35% selects ; 55%
> updates & 5% inserts)
>
>
>      lspci | grep -E 'RAID|SCSI|IDE|SATA'
> 00:1f.2 SATA controller: Intel Corporation C600/X79 series chipset
> 6-Port SATA AHCI Controller (rev 06)
> 02:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2208
> [Thunderbolt] (rev 05)
> 07:00.0 Serial Attached SCSI controller: Intel Corporation C602 chipset
> 4-Port SATA Storage Control Unit (rev 06)
>
> Thanks
>
> Thomas
>
>
> Le 18/05/2015 16:29, Glyn Astill a écrit :
>>>  From: Koray Eyidoğan <korayey@gmail.com>
>>>  To: Thomas SIMON <tsimon@neteven.com>
>>>  Cc: pgsql-admin@postgresql.org
>>>  Sent: Monday, 18 May 2015, 14:51
>>>  Subject: Re: [ADMIN] Performances issues with SSD volume ?
>>>
>>>
>>>
>>>  Hi Thomas,
>>>
>>>
>>>  3.2 kernel may be #1 cause of your I/O load problem:
>>>
>>>
>>>
> http://www.databasesoup.com/2014/09/why-you-need-to-avoid-linux-kernel-32.html
>>>
>>>
>
https://medium.com/postgresql-talk/benchmarking-postgresql-with-different-linux-kernel-versions-on-ubuntu-lts-e61d57b70dd4
>>>
>>>
>>>
>>>  Have a nice day.
>>>
>>>
>>>  Koray
>>
>>  Likely 3.2 kernel isn't helping, but I think we need much more
> information before jumping to conclusions.
>>
>>  You say you're IO bound, so some output from sar / iostat / dstat and
> pg_stat_activity etc before and during the issue would be of use.
>>
>>  Also:
>>       System memory size
>>
>>       SSD Model numbers and how many
>>       Raid controller
>>       Partition allignments and stripe sizes
>>       Kernel options
>>       Filesystem used and mount options
>>       IO Scheduler
>>       Postgresql version and configuration
>>       Connection pool sizing
>>
>>
>>  Perhaps you could thow us the output of some of these:
>>
>>       fdisk -l
>>       cat /etc/mtab
>>       cat /sys/block/<ssd device>/queue/scheduler
>>       sysctl kernel | grep sched
>>       sysctl vm
>>       select * from pg_stat_activity
>>       select name, setting from pg_settings
>>       lspci | grep -E 'RAID|SCSI|IDE|SATA'
>
>
>
> --
> Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-admin
>


pgsql-admin by date:

Previous
From: Thomas SIMON
Date:
Subject: Re: Performances issues with SSD volume ?
Next
From: Glyn Astill
Date:
Subject: Re: Performances issues with SSD volume ?