Re: Performances issues with SSD volume ? - Mailing list pgsql-admin
From | Thomas SIMON |
---|---|
Subject | Re: Performances issues with SSD volume ? |
Date | |
Msg-id | 555DCF59.3060808@neteven.com Whole thread Raw |
In response to | Re: Performances issues with SSD volume ? (Glyn Astill <glynastill@yahoo.co.uk>) |
Responses |
Re: Performances issues with SSD volume ?
|
List | pgsql-admin |
disk was already in noop mode cat /sys/block/sdc/queue/scheduler noop [deadline] cfq Thomas Le 20/05/2015 19:03, Glyn Astill a écrit : > ----- Original Message ----- >> From: Glyn Astill <glynastill@yahoo.co.uk> >> To: Thomas SIMON <tsimon@neteven.com> >> Cc: "pgsql-admin@postgresql.org" <pgsql-admin@postgresql.org> >> Sent: Wednesday, 20 May 2015, 17:50 >> Subject: Re: [ADMIN] Performances issues with SSD volume ? >> >> >> >>> From: Thomas SIMON <tsimon@neteven.com> >>> To: glynastill@yahoo.co.uk >>> Cc: "pgsql-admin@postgresql.org" >> <pgsql-admin@postgresql.org> >>> Sent: Wednesday, 20 May 2015, 16:41 >>> Subject: Re: [ADMIN] Performances issues with SSD volume ? >>> >>> Hi Glyn, >>> >>> I'll try to answer this points. >>> >>> I've made some benchs, and indeed 3.2 not helping. not helping at all. >>> I changed to 3.14 and gap is quite big ! >>> With pgbench RW test, 3.2 --> 4200 TPS ; 3.14 --> 6900 TPS in same >>> conditions >>> With pgbench RO test, 3.2 --> 37000 TPS ; 3.14 --> 95000 TPS, same >>> conditions too. >> >> That's a start then. >> >>> It should so be better, but when server was in production, and ever with >>> bad kernel, performances was already quite good before they quickly >>> decreased. >>> So i think too I have another configuration problem. >>> >>> You say you're IO bound, so some output from sar / iostat / dstat and >>> pg_stat_activity etc before and during the issue would be of use. >>> >>> -> My server is not in production right now, so it is difficult to >>> replay production load and have some useful metrics. >>> The best way I've found is to replay trafic from logs with pgreplay. >>> I hoped that the server falls back by replaying this traffic, but it >>> never happens ... Another thing I can't understand ... >>> >>> Below is my dstat output when I replay this traffic (and so when server >>> runs normally) >>> I have unfortunately no more outputs when server's performances >> decreased. >> >> It's a shame we can't get any insight into activity on the server during >> the issues. >>> >>> Other things you asked >>> >>> System memory size : 256 Go >>> SSD Model numbers and how many : 4 SSd disks ; RAID 10 ; model >>> INTEL SSDSC2BB480G4 >>> Raid controller : MegaRAID SAS 2208 >>> Partition alignments and stripe sizes : see fdisk delow >>> Kernel options : the config file is here : >>> >> ftp://ftp.ovh.net/made-in-ovh/bzImage/3.14.43/config-3.14.43-xxxx-std-ipv6-64 >>> Filesystem used and mount options : ext4, see mtab below >>> IO Scheduler : noop [deadline] cfq for my ssd raid volume >>> Postgresql version and configuration : 9.3.5 >>> >>> max_connections=1800 >>> shared_buffers=8GB >>> temp_buffers=32MB >>> work_mem=100MB >>> maintenance_work_mem=12GB >>> bgwriter_lru_maxpages=200 >>> effective_io_concurrency=4 >>> wal_level=hot_standby >>> wal_sync_method=fdatasync >>> wal_writer_delay=2000ms >>> commit_delay=1000 >>> checkpoint_segments=80 >>> checkpoint_timeout=15min >>> checkpoint_completion_target=0.7 >>> archive_command='rsync ....' >>> max_wal_senders=10 >>> wal_keep_segments=38600 >>> vacuum_defer_cleanup_age=100 >>> hot_standby = on >>> max_standby_archive_delay = 5min >>> max_standby_streaming_delay = 5min >>> hot_standby_feedback = on >>> random_page_cost = 1.0 >>> effective_cache_size = 240GB >>> log_min_error_statement = warning >>> log_min_duration_statement = 0 >>> log_checkpoints = on >>> log_connections = on >>> log_disconnections = on >>> log_line_prefix = '%m|%u|%d|%c|' >>> log_lock_waits = on >>> log_statement = 'all' >>> log_timezone = 'localtime' >>> track_activities = on >>> track_functions = pl >>> track_activity_query_size = 8192 >>> autovacuum_max_workers = 5 >>> autovacuum_naptime = 30s >>> autovacuum_vacuum_threshold = 40 >>> autovacuum_analyze_threshold = 20 >>> autovacuum_vacuum_scale_factor = 0.10 >>> autovacuum_analyze_scale_factor = 0.10 >>> autovacuum_vacuum_cost_delay = 5ms >>> default_transaction_isolation = 'read committed' >>> max_locks_per_transaction = 128 >>> >>> >>> >>> Connection pool sizing (pgpool2) >>> num_init_children = 1790 >>> max_pool = 1 >> >> 1800 is quite a lot of connections, and with max_pool=1 in pgpool you're >> effectively just using pgpool as a proxy (as I recall, my memory is a little >> fuzzy on pgpool now). Unless your app is stateful in some way or has unique >> users for each of those 1800 connections you should lower the quantity of active >> connections. A general starting point is usually cpu cores * 2, so you could up >> max_pool and divide num_init_children by the same amount. >> >> Hard to say what you need to do without knowing what exactly you're doing >> though. What's the nature of the app(s)? >> >>> I also add megacli parameters : >>> >>> Virtual Drive: 2 (Target Id: 2) >>> Name :datassd >>> RAID Level : Primary-1, Secondary-0, RAID Level Qualifier-0 >>> Size : 893.25 GB >>> Sector Size : 512 >>> Is VD emulated : Yes >>> Mirror Data : 893.25 GB >>> State : Optimal >>> Strip Size : 256 KB >>> Number Of Drives per span:2 >>> Span Depth : 2 >>> Default Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write >>> Cache if Bad BBU >>> Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write >>> Cache if Bad BBU >>> Default Access Policy: Read/Write >>> Current Access Policy: Read/Write >>> Disk Cache Policy : Enabled >>> Encryption Type : None >>> Bad Blocks Exist: No >>> PI type: No PI >>> >>> Is VD Cached: No >> >> Not using your raid controllers write cache then? Not sure just how important >> that is with SSDs these days, but if you've got a BBU set it to >> "WriteBack". Also change "Cache if Bad BBU" to "No >> Write Cache if Bad BBU" if you do that. >> >> >>> Other outputs : >>> fdisk -l >>> >>> Disk /dev/sdc: 959.1 GB, 959119884288 bytes >>> 255 heads, 63 sectors/track, 116606 cylinders, total 1873281024 sectors >>> Units = sectors of 1 * 512 = 512 bytes >>> Sector size (logical/physical): 512 bytes / 4096 bytes >>> I/O size (minimum/optimal): 4096 bytes / 4096 bytes >>> Disk identifier: 0x00000000 >>> >>> Disk /dev/mapper/vg_datassd-lv_datassd: 751.6 GB, 751619276800 bytes >>> 255 heads, 63 sectors/track, 91379 cylinders, total 1468006400 sectors >>> Units = sectors of 1 * 512 = 512 bytes >>> Sector size (logical/physical): 512 bytes / 4096 bytes >>> I/O size (minimum/optimal): 4096 bytes / 4096 bytes >>> Disk identifier: 0x00000000 >>> >>> >>> cat /etc/mtab >>> /dev/mapper/vg_datassd-lv_datassd /datassd ext4 >>> rw,relatime,discard,nobarrier,data=ordered 0 0 >>> (I added nobarrier option) >>> >>> >>> cat /sys/block/sdc/queue/scheduler >>> noop [deadline] cfq >>> >> >> You could swap relatime for noatime,nodiratime. >> > > You could also see if the noop scheduler makes any improvement. > >
pgsql-admin by date: