Re: Performances issues with SSD volume ? - Mailing list pgsql-admin
From | Glyn Astill |
---|---|
Subject | Re: Performances issues with SSD volume ? |
Date | |
Msg-id | 1247264598.4844463.1432140629700.JavaMail.yahoo@mail.yahoo.com Whole thread Raw |
In response to | Re: Performances issues with SSD volume ? (Thomas SIMON <tsimon@neteven.com>) |
Responses |
Re: Performances issues with SSD volume ?
Re: Performances issues with SSD volume ? |
List | pgsql-admin |
> From: Thomas SIMON <tsimon@neteven.com> > To: glynastill@yahoo.co.uk > Cc: "pgsql-admin@postgresql.org" <pgsql-admin@postgresql.org> > Sent: Wednesday, 20 May 2015, 16:41 > Subject: Re: [ADMIN] Performances issues with SSD volume ? > > Hi Glyn, > > I'll try to answer this points. > > I've made some benchs, and indeed 3.2 not helping. not helping at all. > I changed to 3.14 and gap is quite big ! > With pgbench RW test, 3.2 --> 4200 TPS ; 3.14 --> 6900 TPS in same > conditions > With pgbench RO test, 3.2 --> 37000 TPS ; 3.14 --> 95000 TPS, same > conditions too. > That's a start then. > It should so be better, but when server was in production, and ever with > bad kernel, performances was already quite good before they quickly > decreased. > So i think too I have another configuration problem. > > You say you're IO bound, so some output from sar / iostat / dstat and > pg_stat_activity etc before and during the issue would be of use. > > -> My server is not in production right now, so it is difficult to > replay production load and have some useful metrics. > The best way I've found is to replay trafic from logs with pgreplay. > I hoped that the server falls back by replaying this traffic, but it > never happens ... Another thing I can't understand ... > > Below is my dstat output when I replay this traffic (and so when server > runs normally) > I have unfortunately no more outputs when server's performances decreased. > It's a shame we can't get any insight into activity on the server during the issues. > > > Other things you asked > > System memory size : 256 Go > SSD Model numbers and how many : 4 SSd disks ; RAID 10 ; model > INTEL SSDSC2BB480G4 > Raid controller : MegaRAID SAS 2208 > Partition alignments and stripe sizes : see fdisk delow > Kernel options : the config file is here : > ftp://ftp.ovh.net/made-in-ovh/bzImage/3.14.43/config-3.14.43-xxxx-std-ipv6-64 > Filesystem used and mount options : ext4, see mtab below > IO Scheduler : noop [deadline] cfq for my ssd raid volume > Postgresql version and configuration : 9.3.5 > > max_connections=1800 > shared_buffers=8GB > temp_buffers=32MB > work_mem=100MB > maintenance_work_mem=12GB > bgwriter_lru_maxpages=200 > effective_io_concurrency=4 > wal_level=hot_standby > wal_sync_method=fdatasync > wal_writer_delay=2000ms > commit_delay=1000 > checkpoint_segments=80 > checkpoint_timeout=15min > checkpoint_completion_target=0.7 > archive_command='rsync ....' > max_wal_senders=10 > wal_keep_segments=38600 > vacuum_defer_cleanup_age=100 > hot_standby = on > max_standby_archive_delay = 5min > max_standby_streaming_delay = 5min > hot_standby_feedback = on > random_page_cost = 1.0 > effective_cache_size = 240GB > log_min_error_statement = warning > log_min_duration_statement = 0 > log_checkpoints = on > log_connections = on > log_disconnections = on > log_line_prefix = '%m|%u|%d|%c|' > log_lock_waits = on > log_statement = 'all' > log_timezone = 'localtime' > track_activities = on > track_functions = pl > track_activity_query_size = 8192 > autovacuum_max_workers = 5 > autovacuum_naptime = 30s > autovacuum_vacuum_threshold = 40 > autovacuum_analyze_threshold = 20 > autovacuum_vacuum_scale_factor = 0.10 > autovacuum_analyze_scale_factor = 0.10 > autovacuum_vacuum_cost_delay = 5ms > default_transaction_isolation = 'read committed' > max_locks_per_transaction = 128 > > > > Connection pool sizing (pgpool2) > num_init_children = 1790 > max_pool = 1 > 1800 is quite a lot of connections, and with max_pool=1 in pgpool you're effectively just using pgpool as a proxy (as I recall,my memory is a little fuzzy on pgpool now). Unless your app is stateful in some way or has unique users for eachof those 1800 connections you should lower the quantity of active connections. A general starting point is usually cpucores * 2, so you could up max_pool and divide num_init_children by the same amount. Hard to say what you need to do without knowing what exactly you're doing though. What's the nature of the app(s)? > I also add megacli parameters : > > Virtual Drive: 2 (Target Id: 2) > Name :datassd > RAID Level : Primary-1, Secondary-0, RAID Level Qualifier-0 > Size : 893.25 GB > Sector Size : 512 > Is VD emulated : Yes > Mirror Data : 893.25 GB > State : Optimal > Strip Size : 256 KB > Number Of Drives per span:2 > Span Depth : 2 > Default Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write > Cache if Bad BBU > Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write > Cache if Bad BBU > Default Access Policy: Read/Write > Current Access Policy: Read/Write > Disk Cache Policy : Enabled > Encryption Type : None > Bad Blocks Exist: No > PI type: No PI > > Is VD Cached: No > Not using your raid controllers write cache then? Not sure just how important that is with SSDs these days, but if you'vegot a BBU set it to "WriteBack". Also change "Cache if Bad BBU" to "No Write Cache if Bad BBU" if you do that. > > Other outputs : > fdisk -l > > Disk /dev/sdc: 959.1 GB, 959119884288 bytes > 255 heads, 63 sectors/track, 116606 cylinders, total 1873281024 sectors > Units = sectors of 1 * 512 = 512 bytes > Sector size (logical/physical): 512 bytes / 4096 bytes > I/O size (minimum/optimal): 4096 bytes / 4096 bytes > Disk identifier: 0x00000000 > > Disk /dev/mapper/vg_datassd-lv_datassd: 751.6 GB, 751619276800 bytes > 255 heads, 63 sectors/track, 91379 cylinders, total 1468006400 sectors > Units = sectors of 1 * 512 = 512 bytes > Sector size (logical/physical): 512 bytes / 4096 bytes > I/O size (minimum/optimal): 4096 bytes / 4096 bytes > Disk identifier: 0x00000000 > > > cat /etc/mtab > /dev/mapper/vg_datassd-lv_datassd /datassd ext4 > rw,relatime,discard,nobarrier,data=ordered 0 0 > (I added nobarrier option) > > > cat /sys/block/sdc/queue/scheduler > noop [deadline] cfq > > You could swap relatime for noatime,nodiratime. > sysctl kernel | grep sched > kernel.sched_child_runs_first = 0 > kernel.sched_rr_timeslice_ms = 25 > kernel.sched_rt_period_us = 1000000 > kernel.sched_rt_runtime_us = 950000 > > I've read some advices about changing kernel.sched_autogroup_enabled=0 > and kernel.sched_migration_cost_ns=5000000, but this parameters are not > recognized by my kernel. So I don't know what to do with that... > sched_migration_cost_ns would be called sched_migration_cost in your old 3.2 kernel, not sure why sched_autogroup_enabled wouldn't be recognized though. > sysctl vm > vm.admin_reserve_kbytes = 8192 > vm.block_dump = 0 > vm.dirty_background_bytes = 8388608 > vm.dirty_background_ratio = 0 > vm.dirty_bytes = 67108864 > vm.dirty_expire_centisecs = 3000 > vm.dirty_ratio = 0 > vm.dirty_writeback_centisecs = 500 > vm.drop_caches = 3 > vm.laptop_mode = 0 > vm.legacy_va_layout = 0 > vm.lowmem_reserve_ratio = 256 256 32 > vm.max_map_count = 65530 > vm.memory_failure_early_kill = 0 > vm.memory_failure_recovery = 1 > vm.min_free_kbytes = 65008 > vm.min_slab_ratio = 5 > vm.min_unmapped_ratio = 1 > vm.mmap_min_addr = 4096 > vm.nr_pdflush_threads = 0 > vm.numa_zonelist_order = default > vm.oom_dump_tasks = 1 > vm.oom_kill_allocating_task = 0 > vm.overcommit_kbytes = 0 > vm.overcommit_memory = 2 > vm.overcommit_ratio = 50 > vm.page-cluster = 3 > vm.panic_on_oom = 0 > vm.percpu_pagelist_fraction = 0 > vm.scan_unevictable_pages = 0 > vm.stat_interval = 1 > vm.swappiness = 60 > vm.user_reserve_kbytes = 131072 > vm.vfs_cache_pressure = 100 > vm.zone_reclaim_mode = 0 > > > select * from pg_stat_activity > I've got hundred of entries for that when i'm in production, and I > can't > paste it here due to confidentiality. > By day, it is around 50 millions queries usually. (35% selects ; 55% > updates & 5% inserts) > > > lspci | grep -E 'RAID|SCSI|IDE|SATA' > 00:1f.2 SATA controller: Intel Corporation C600/X79 series chipset > 6-Port SATA AHCI Controller (rev 06) > 02:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2208 > [Thunderbolt] (rev 05) > 07:00.0 Serial Attached SCSI controller: Intel Corporation C602 chipset > 4-Port SATA Storage Control Unit (rev 06) > > Thanks > > Thomas > > > Le 18/05/2015 16:29, Glyn Astill a écrit : >>> From: Koray Eyidoğan <korayey@gmail.com> >>> To: Thomas SIMON <tsimon@neteven.com> >>> Cc: pgsql-admin@postgresql.org >>> Sent: Monday, 18 May 2015, 14:51 >>> Subject: Re: [ADMIN] Performances issues with SSD volume ? >>> >>> >>> >>> Hi Thomas, >>> >>> >>> 3.2 kernel may be #1 cause of your I/O load problem: >>> >>> >>> > http://www.databasesoup.com/2014/09/why-you-need-to-avoid-linux-kernel-32.html >>> >>> > https://medium.com/postgresql-talk/benchmarking-postgresql-with-different-linux-kernel-versions-on-ubuntu-lts-e61d57b70dd4 >>> >>> >>> >>> Have a nice day. >>> >>> >>> Koray >> >> Likely 3.2 kernel isn't helping, but I think we need much more > information before jumping to conclusions. >> >> You say you're IO bound, so some output from sar / iostat / dstat and > pg_stat_activity etc before and during the issue would be of use. >> >> Also: >> System memory size >> >> SSD Model numbers and how many >> Raid controller >> Partition allignments and stripe sizes >> Kernel options >> Filesystem used and mount options >> IO Scheduler >> Postgresql version and configuration >> Connection pool sizing >> >> >> Perhaps you could thow us the output of some of these: >> >> fdisk -l >> cat /etc/mtab >> cat /sys/block/<ssd device>/queue/scheduler >> sysctl kernel | grep sched >> sysctl vm >> select * from pg_stat_activity >> select name, setting from pg_settings >> lspci | grep -E 'RAID|SCSI|IDE|SATA' > > > > -- > Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-admin >
pgsql-admin by date: