Re: Performances issues with SSD volume ? - Mailing list pgsql-admin
From | Thomas SIMON |
---|---|
Subject | Re: Performances issues with SSD volume ? |
Date | |
Msg-id | 555DCBA6.1050607@neteven.com Whole thread Raw |
In response to | Re: Performances issues with SSD volume ? (Glyn Astill <glynastill@yahoo.co.uk>) |
Responses |
Re: Performances issues with SSD volume ?
|
List | pgsql-admin |
Le 20/05/2015 18:50, Glyn Astill a écrit : > >> From: Thomas SIMON <tsimon@neteven.com> >> To: glynastill@yahoo.co.uk >> Cc: "pgsql-admin@postgresql.org" <pgsql-admin@postgresql.org> >> Sent: Wednesday, 20 May 2015, 16:41 >> Subject: Re: [ADMIN] Performances issues with SSD volume ? >> >> Hi Glyn, >> >> I'll try to answer this points. >> >> I've made some benchs, and indeed 3.2 not helping. not helping at all. >> I changed to 3.14 and gap is quite big ! >> With pgbench RW test, 3.2 --> 4200 TPS ; 3.14 --> 6900 TPS in same >> conditions >> With pgbench RO test, 3.2 --> 37000 TPS ; 3.14 --> 95000 TPS, same >> conditions too. > > That's a start then. > >> It should so be better, but when server was in production, and ever with >> bad kernel, performances was already quite good before they quickly >> decreased. >> So i think too I have another configuration problem. >> >> You say you're IO bound, so some output from sar / iostat / dstat and >> pg_stat_activity etc before and during the issue would be of use. >> >> -> My server is not in production right now, so it is difficult to >> replay production load and have some useful metrics. >> The best way I've found is to replay trafic from logs with pgreplay. >> I hoped that the server falls back by replaying this traffic, but it >> never happens ... Another thing I can't understand ... >> >> Below is my dstat output when I replay this traffic (and so when server >> runs normally) >> I have unfortunately no more outputs when server's performances decreased. > It's a shame we can't get any insight into activity on the server during the issues. >> >> Other things you asked >> >> System memory size : 256 Go >> SSD Model numbers and how many : 4 SSd disks ; RAID 10 ; model >> INTEL SSDSC2BB480G4 >> Raid controller : MegaRAID SAS 2208 >> Partition alignments and stripe sizes : see fdisk delow >> Kernel options : the config file is here : >> ftp://ftp.ovh.net/made-in-ovh/bzImage/3.14.43/config-3.14.43-xxxx-std-ipv6-64 >> Filesystem used and mount options : ext4, see mtab below >> IO Scheduler : noop [deadline] cfq for my ssd raid volume >> Postgresql version and configuration : 9.3.5 >> >> max_connections=1800 >> shared_buffers=8GB >> temp_buffers=32MB >> work_mem=100MB >> maintenance_work_mem=12GB >> bgwriter_lru_maxpages=200 >> effective_io_concurrency=4 >> wal_level=hot_standby >> wal_sync_method=fdatasync >> wal_writer_delay=2000ms >> commit_delay=1000 >> checkpoint_segments=80 >> checkpoint_timeout=15min >> checkpoint_completion_target=0.7 >> archive_command='rsync ....' >> max_wal_senders=10 >> wal_keep_segments=38600 >> vacuum_defer_cleanup_age=100 >> hot_standby = on >> max_standby_archive_delay = 5min >> max_standby_streaming_delay = 5min >> hot_standby_feedback = on >> random_page_cost = 1.0 >> effective_cache_size = 240GB >> log_min_error_statement = warning >> log_min_duration_statement = 0 >> log_checkpoints = on >> log_connections = on >> log_disconnections = on >> log_line_prefix = '%m|%u|%d|%c|' >> log_lock_waits = on >> log_statement = 'all' >> log_timezone = 'localtime' >> track_activities = on >> track_functions = pl >> track_activity_query_size = 8192 >> autovacuum_max_workers = 5 >> autovacuum_naptime = 30s >> autovacuum_vacuum_threshold = 40 >> autovacuum_analyze_threshold = 20 >> autovacuum_vacuum_scale_factor = 0.10 >> autovacuum_analyze_scale_factor = 0.10 >> autovacuum_vacuum_cost_delay = 5ms >> default_transaction_isolation = 'read committed' >> max_locks_per_transaction = 128 >> >> >> >> Connection pool sizing (pgpool2) >> num_init_children = 1790 >> max_pool = 1 > > 1800 is quite a lot of connections, and with max_pool=1 in pgpool you're effectively just using pgpool as a proxy (as Irecall, my memory is a little fuzzy on pgpool now). Unless your app is stateful in some way or has unique users for eachof those 1800 connections you should lower the quantity of active connections. A general starting point is usually cpucores * 2, so you could up max_pool and divide num_init_children by the same amount. > > Hard to say what you need to do without knowing what exactly you're doing though. What's the nature of the app(s)? Yes, we just use it as a proxy for now. We have approximately 100 different active users, doing for all of then various number of connexions (twisted + zope apps) result is ~ 900 idle connexions for ~ 60 active connexions, but sometimes (when stopping/starting prod), we need almost double of connexion because some twisted services don't stop their connexions immediatly. But this is the actual (working) configuration, and I don't think think my performance disk is related to this. > >> I also add megacli parameters : >> >> Virtual Drive: 2 (Target Id: 2) >> Name :datassd >> RAID Level : Primary-1, Secondary-0, RAID Level Qualifier-0 >> Size : 893.25 GB >> Sector Size : 512 >> Is VD emulated : Yes >> Mirror Data : 893.25 GB >> State : Optimal >> Strip Size : 256 KB >> Number Of Drives per span:2 >> Span Depth : 2 >> Default Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write >> Cache if Bad BBU >> Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write >> Cache if Bad BBU >> Default Access Policy: Read/Write >> Current Access Policy: Read/Write >> Disk Cache Policy : Enabled >> Encryption Type : None >> Bad Blocks Exist: No >> PI type: No PI >> >> Is VD Cached: No > > Not using your raid controllers write cache then? Not sure just how important that is with SSDs these days, but if you'vegot a BBU set it to "WriteBack". Also change "Cache if Bad BBU" to "No Write Cache if Bad BBU" if you do that. No, I had read some megacli related docs about SSD, and the advice was to put writethrough on disks. (see http://wiki.mikejung.biz/LSI#Configure_LSI_Card_for_SSD_RAID), last section. Disks are already in "No Write Cache if Bad BBU" mode. (wrote on splitted line on my extract) > > >> Other outputs : >> fdisk -l >> >> Disk /dev/sdc: 959.1 GB, 959119884288 bytes >> 255 heads, 63 sectors/track, 116606 cylinders, total 1873281024 sectors >> Units = sectors of 1 * 512 = 512 bytes >> Sector size (logical/physical): 512 bytes / 4096 bytes >> I/O size (minimum/optimal): 4096 bytes / 4096 bytes >> Disk identifier: 0x00000000 >> >> Disk /dev/mapper/vg_datassd-lv_datassd: 751.6 GB, 751619276800 bytes >> 255 heads, 63 sectors/track, 91379 cylinders, total 1468006400 sectors >> Units = sectors of 1 * 512 = 512 bytes >> Sector size (logical/physical): 512 bytes / 4096 bytes >> I/O size (minimum/optimal): 4096 bytes / 4096 bytes >> Disk identifier: 0x00000000 >> >> >> cat /etc/mtab >> /dev/mapper/vg_datassd-lv_datassd /datassd ext4 >> rw,relatime,discard,nobarrier,data=ordered 0 0 >> (I added nobarrier option) >> >> >> cat /sys/block/sdc/queue/scheduler >> noop [deadline] cfq >> > > You could swap relatime for noatime,nodiratime. I'll swap to noatime, thanks. > >> sysctl kernel | grep sched >> kernel.sched_child_runs_first = 0 >> kernel.sched_rr_timeslice_ms = 25 >> kernel.sched_rt_period_us = 1000000 >> kernel.sched_rt_runtime_us = 950000 >> >> I've read some advices about changing kernel.sched_autogroup_enabled=0 >> and kernel.sched_migration_cost_ns=5000000, but this parameters are not >> recognized by my kernel. So I don't know what to do with that... > sched_migration_cost_ns would be called sched_migration_cost in your old 3.2 kernel, not sure why > sched_autogroup_enabled wouldn't be recognized though. I've found on config file that "CONFIG_SCHED_AUTOGROUP is not set" in this kernel. So i guess it is the same thing as if it was enabled=0 , right ? I haven't found any parameter related to migration_sost in config file. > >> sysctl vm >> vm.admin_reserve_kbytes = 8192 >> vm.block_dump = 0 >> vm.dirty_background_bytes = 8388608 >> vm.dirty_background_ratio = 0 >> vm.dirty_bytes = 67108864 >> vm.dirty_expire_centisecs = 3000 >> vm.dirty_ratio = 0 >> vm.dirty_writeback_centisecs = 500 >> vm.drop_caches = 3 >> vm.laptop_mode = 0 >> vm.legacy_va_layout = 0 >> vm.lowmem_reserve_ratio = 256 256 32 >> vm.max_map_count = 65530 >> vm.memory_failure_early_kill = 0 >> vm.memory_failure_recovery = 1 >> vm.min_free_kbytes = 65008 >> vm.min_slab_ratio = 5 >> vm.min_unmapped_ratio = 1 >> vm.mmap_min_addr = 4096 >> vm.nr_pdflush_threads = 0 >> vm.numa_zonelist_order = default >> vm.oom_dump_tasks = 1 >> vm.oom_kill_allocating_task = 0 >> vm.overcommit_kbytes = 0 >> vm.overcommit_memory = 2 >> vm.overcommit_ratio = 50 >> vm.page-cluster = 3 >> vm.panic_on_oom = 0 >> vm.percpu_pagelist_fraction = 0 >> vm.scan_unevictable_pages = 0 >> vm.stat_interval = 1 >> vm.swappiness = 60 >> vm.user_reserve_kbytes = 131072 >> vm.vfs_cache_pressure = 100 >> vm.zone_reclaim_mode = 0 >> >> >> select * from pg_stat_activity >> I've got hundred of entries for that when i'm in production, and I >> can't >> paste it here due to confidentiality. >> By day, it is around 50 millions queries usually. (35% selects ; 55% >> updates & 5% inserts) >> >> >> lspci | grep -E 'RAID|SCSI|IDE|SATA' >> 00:1f.2 SATA controller: Intel Corporation C600/X79 series chipset >> 6-Port SATA AHCI Controller (rev 06) >> 02:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2208 >> [Thunderbolt] (rev 05) >> 07:00.0 Serial Attached SCSI controller: Intel Corporation C602 chipset >> 4-Port SATA Storage Control Unit (rev 06) >> >> Thanks >> >> Thomas >> >> >> Le 18/05/2015 16:29, Glyn Astill a écrit : >>>> From: Koray Eyidoğan <korayey@gmail.com> >>>> To: Thomas SIMON <tsimon@neteven.com> >>>> Cc: pgsql-admin@postgresql.org >>>> Sent: Monday, 18 May 2015, 14:51 >>>> Subject: Re: [ADMIN] Performances issues with SSD volume ? >>>> >>>> >>>> >>>> Hi Thomas, >>>> >>>> >>>> 3.2 kernel may be #1 cause of your I/O load problem: >>>> >>>> >>>> >> http://www.databasesoup.com/2014/09/why-you-need-to-avoid-linux-kernel-32.html >>>> >> https://medium.com/postgresql-talk/benchmarking-postgresql-with-different-linux-kernel-versions-on-ubuntu-lts-e61d57b70dd4 >>>> >>>> >>>> Have a nice day. >>>> >>>> >>>> Koray >>> Likely 3.2 kernel isn't helping, but I think we need much more >> information before jumping to conclusions. >>> You say you're IO bound, so some output from sar / iostat / dstat and >> pg_stat_activity etc before and during the issue would be of use. >>> Also: >>> System memory size >>> >>> SSD Model numbers and how many >>> Raid controller >>> Partition allignments and stripe sizes >>> Kernel options >>> Filesystem used and mount options >>> IO Scheduler >>> Postgresql version and configuration >>> Connection pool sizing >>> >>> >>> Perhaps you could thow us the output of some of these: >>> >>> fdisk -l >>> cat /etc/mtab >>> cat /sys/block/<ssd device>/queue/scheduler >>> sysctl kernel | grep sched >>> sysctl vm >>> select * from pg_stat_activity >>> select name, setting from pg_settings >>> lspci | grep -E 'RAID|SCSI|IDE|SATA' >> >> >> -- >> Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-admin >>
pgsql-admin by date: