Re: Performances issues with SSD volume ? - Mailing list pgsql-admin

From Thomas SIMON
Subject Re: Performances issues with SSD volume ?
Date
Msg-id 555DCF59.3060808@neteven.com
Whole thread Raw
In response to Re: Performances issues with SSD volume ?  (Glyn Astill <glynastill@yahoo.co.uk>)
Responses Re: Performances issues with SSD volume ?
List pgsql-admin
disk was already in noop mode

cat /sys/block/sdc/queue/scheduler
  noop [deadline] cfq


Thomas

Le 20/05/2015 19:03, Glyn Astill a écrit :
> ----- Original Message -----
>> From: Glyn Astill <glynastill@yahoo.co.uk>
>> To: Thomas SIMON <tsimon@neteven.com>
>> Cc: "pgsql-admin@postgresql.org" <pgsql-admin@postgresql.org>
>> Sent: Wednesday, 20 May 2015, 17:50
>> Subject: Re: [ADMIN] Performances issues with SSD volume ?
>>
>>
>>
>>>   From: Thomas SIMON <tsimon@neteven.com>
>>>   To: glynastill@yahoo.co.uk
>>>   Cc: "pgsql-admin@postgresql.org"
>> <pgsql-admin@postgresql.org>
>>>   Sent: Wednesday, 20 May 2015, 16:41
>>>   Subject: Re: [ADMIN] Performances issues with SSD volume ?
>>>
>>>   Hi Glyn,
>>>
>>>   I'll try to answer this points.
>>>
>>>   I've made some benchs, and indeed 3.2 not helping. not helping at all.
>>>   I changed to 3.14 and gap is quite big !
>>>   With pgbench RW test, 3.2 --> 4200 TPS ; 3.14 --> 6900 TPS in same
>>>   conditions
>>>   With pgbench RO test, 3.2 --> 37000 TPS ; 3.14 --> 95000 TPS, same
>>>   conditions too.
>>
>> That's a start then.
>>
>>>   It should so be better, but when server was in production, and ever with
>>>   bad kernel, performances was already quite good before they quickly
>>>   decreased.
>>>   So i think too I have another configuration problem.
>>>
>>>   You say you're IO bound, so some output from sar / iostat / dstat and
>>>   pg_stat_activity etc before and during the issue would be of use.
>>>
>>>   -> My server is not in production right now, so it is difficult to
>>>   replay production load and have some useful metrics.
>>>   The best way I've found is to replay trafic from logs with pgreplay.
>>>   I hoped that the server falls back by replaying this traffic, but it
>>>   never happens ... Another thing I can't understand ...
>>>
>>>   Below is my dstat output when I replay this traffic (and so when server
>>>   runs normally)
>>>   I have unfortunately no more outputs when server's performances
>> decreased.
>>
>> It's a shame we can't get any insight into activity on the server during
>> the issues.
>>>
>>>   Other things you asked
>>>
>>>        System memory size : 256 Go
>>>        SSD Model numbers and how many : 4 SSd disks ; RAID 10 ; model
>>>   INTEL SSDSC2BB480G4
>>>        Raid controller : MegaRAID SAS 2208
>>>        Partition alignments and stripe sizes : see fdisk delow
>>>        Kernel options : the config file is here :
>>>
>> ftp://ftp.ovh.net/made-in-ovh/bzImage/3.14.43/config-3.14.43-xxxx-std-ipv6-64
>>>        Filesystem used and mount options : ext4, see mtab below
>>>        IO Scheduler : noop [deadline] cfq for my ssd raid volume
>>>        Postgresql version and configuration : 9.3.5
>>>
>>>   max_connections=1800
>>>   shared_buffers=8GB
>>>   temp_buffers=32MB
>>>   work_mem=100MB
>>>   maintenance_work_mem=12GB
>>>   bgwriter_lru_maxpages=200
>>>   effective_io_concurrency=4
>>>   wal_level=hot_standby
>>>   wal_sync_method=fdatasync
>>>   wal_writer_delay=2000ms
>>>   commit_delay=1000
>>>   checkpoint_segments=80
>>>   checkpoint_timeout=15min
>>>   checkpoint_completion_target=0.7
>>>   archive_command='rsync ....'
>>>   max_wal_senders=10
>>>   wal_keep_segments=38600
>>>   vacuum_defer_cleanup_age=100
>>>   hot_standby = on
>>>   max_standby_archive_delay = 5min
>>>   max_standby_streaming_delay = 5min
>>>   hot_standby_feedback = on
>>>   random_page_cost = 1.0
>>>   effective_cache_size = 240GB
>>>   log_min_error_statement = warning
>>>   log_min_duration_statement = 0
>>>   log_checkpoints = on
>>>   log_connections = on
>>>   log_disconnections = on
>>>   log_line_prefix = '%m|%u|%d|%c|'
>>>   log_lock_waits = on
>>>   log_statement = 'all'
>>>   log_timezone = 'localtime'
>>>   track_activities = on
>>>   track_functions = pl
>>>   track_activity_query_size = 8192
>>>   autovacuum_max_workers = 5
>>>   autovacuum_naptime = 30s
>>>   autovacuum_vacuum_threshold = 40
>>>   autovacuum_analyze_threshold = 20
>>>   autovacuum_vacuum_scale_factor = 0.10
>>>   autovacuum_analyze_scale_factor = 0.10
>>>   autovacuum_vacuum_cost_delay = 5ms
>>>   default_transaction_isolation = 'read committed'
>>>   max_locks_per_transaction = 128
>>>
>>>
>>>
>>>        Connection pool sizing (pgpool2)
>>>   num_init_children = 1790
>>>   max_pool = 1
>>
>> 1800 is quite a lot of connections, and with max_pool=1 in pgpool you're
>> effectively just using pgpool as a proxy (as I recall, my memory is a little
>> fuzzy on pgpool now).  Unless your app is stateful in some way or has unique
>> users for each of those 1800 connections you should lower the quantity of active
>> connections.  A general starting point is usually cpu cores * 2, so you could up
>> max_pool and divide num_init_children by the same amount.
>>
>> Hard to say what you need to do without knowing what exactly you're doing
>> though.  What's the nature of the app(s)?
>>
>>>   I also add megacli parameters :
>>>
>>>   Virtual Drive: 2 (Target Id: 2)
>>>   Name                :datassd
>>>   RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
>>>   Size                : 893.25 GB
>>>   Sector Size         : 512
>>>   Is VD emulated      : Yes
>>>   Mirror Data         : 893.25 GB
>>>   State               : Optimal
>>>   Strip Size          : 256 KB
>>>   Number Of Drives per span:2
>>>   Span Depth          : 2
>>>   Default Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write
>>>   Cache if Bad BBU
>>>   Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write
>>>   Cache if Bad BBU
>>>   Default Access Policy: Read/Write
>>>   Current Access Policy: Read/Write
>>>   Disk Cache Policy   : Enabled
>>>   Encryption Type     : None
>>>   Bad Blocks Exist: No
>>>   PI type: No PI
>>>
>>>   Is VD Cached: No
>>
>> Not using your raid controllers write cache then?  Not sure just how important
>> that is with SSDs these days, but if you've got a BBU set it to
>> "WriteBack". Also change "Cache if Bad BBU" to "No
>> Write Cache if Bad BBU" if you do that.
>>
>>
>>>   Other outputs :
>>>        fdisk -l
>>>
>>>   Disk /dev/sdc: 959.1 GB, 959119884288 bytes
>>>   255 heads, 63 sectors/track, 116606 cylinders, total 1873281024 sectors
>>>   Units = sectors of 1 * 512 = 512 bytes
>>>   Sector size (logical/physical): 512 bytes / 4096 bytes
>>>   I/O size (minimum/optimal): 4096 bytes / 4096 bytes
>>>   Disk identifier: 0x00000000
>>>
>>>   Disk /dev/mapper/vg_datassd-lv_datassd: 751.6 GB, 751619276800 bytes
>>>   255 heads, 63 sectors/track, 91379 cylinders, total 1468006400 sectors
>>>   Units = sectors of 1 * 512 = 512 bytes
>>>   Sector size (logical/physical): 512 bytes / 4096 bytes
>>>   I/O size (minimum/optimal): 4096 bytes / 4096 bytes
>>>   Disk identifier: 0x00000000
>>>
>>>
>>>        cat /etc/mtab
>>>   /dev/mapper/vg_datassd-lv_datassd /datassd ext4
>>>   rw,relatime,discard,nobarrier,data=ordered 0 0
>>>   (I added nobarrier option)
>>>
>>>
>>>        cat /sys/block/sdc/queue/scheduler
>>>   noop [deadline] cfq
>>>
>>
>> You could swap relatime for noatime,nodiratime.
>>
>
> You could also see if the noop scheduler makes any improvement.
>
>



pgsql-admin by date:

Previous
From: Thomas SIMON
Date:
Subject: Re: Performances issues with SSD volume ?
Next
From: Glyn Astill
Date:
Subject: Re: Performances issues with SSD volume ?