Query plan looks OK, but slow I/O - settings advice? - Mailing list pgsql-performance
From | Roger Hand |
---|---|
Subject | Query plan looks OK, but slow I/O - settings advice? |
Date | |
Msg-id | DB28E9B548192448A4E8C8A3C1B1E475611C66@sj1-exch-01.us.corp.kailea.com Whole thread Raw |
Responses |
Re: Query plan looks OK, but slow I/O - settings advice?
Re: Query plan looks OK, but slow I/O - settings advice? |
List | pgsql-performance |
Summary ======= We are writing to the db pretty much 24 hours a day. Recently the amount of data we write has increased, and the query speed, formerly okay, has taken a dive. The query is using the indexes as expected, so I don't _think_ I have a query tuning issue, just an io problem. The first time a query is done it takes about 60 seconds. The second time it runs in about 6 seconds. What I know I need advice on is io settings and various buffer settings. I may also need advice on other things, but just don't know it yet! Below is ... - an explain analyze - details of the db setup and hardware - some vmstat and iostat output showing the disks are very busy - the SHOW ALL output for the db config. Details ======= Postgres 8.0.3 Below is a sample query. (This is actually implemented as a prepared statement. Here I fill in the '?'s with actual values.) electric=# EXPLAIN ANALYZE electric-# SELECT datavalue, logfielddatatype, timestamp FROM logdata_recent electric-# WHERE (logfielddatatype = 70 OR logfielddatatype = 71 OR logfielddatatype = 69) electric-# AND graphtargetlog = 1327 electric-# AND timestamp >= 1123052400 AND timestamp <= 1123138800 electric-# ORDER BY timestamp; QUERY PLAN -------------------------------------------------- Sort (cost=82.48..82.50 rows=6 width=14) (actual time=60208.968..60211.232 rows=2625 loops=1) Sort Key: public.logdata_recent."timestamp" -> Result (cost=0.00..82.41 rows=6 width=14) (actual time=52.483..60200.868 rows=2625 loops=1) -> Append (cost=0.00..82.41 rows=6 width=14) (actual time=52.476..60189.929 rows=2625 loops=1) -> Seq Scan on logdata_recent (cost=0.00..46.25 rows=1 width=14) (actual time=0.003..0.003 rows=0 loops=1) Filter: (((logfielddatatype = 70) OR (logfielddatatype = 71) OR (logfielddatatype = 69)) AND (graphtargetlog= 1327) AND ("timestamp" >= 1123052400) AND ("timestamp" <= 1123138800)) -> Index Scan using logdata_recent_1123085306_ix_t_fld_gtl, logdata_recent_1123085306_ix_t_fld_gtl, logdata_recent_1123085306_ix_t_fld_gtlon logdata_recent_stale logdata_recent (cost=0.00..18.08 rows=3 width=14) (actualtime=52.465..60181.624 rows=2625 loops=1) Index Cond: ((("timestamp" >= 1123052400) AND ("timestamp" <= 1123138800) AND (logfielddatatype = 70)AND (graphtargetlog = 1327)) OR (("timestamp" >= 1123052400) AND ("timestamp" <= 1123138800) AND (logfielddatatype = 71)AND (graphtargetlog = 1327)) OR (("timestamp" >= 1123052400) AND ("timestamp" <= 1123138800) AND (logfielddatatype = 69)AND (graphtargetlog = 1327))) Filter: (((logfielddatatype = 70) OR (logfielddatatype = 71) OR (logfielddatatype = 69)) AND (graphtargetlog= 1327) AND ("timestamp" >= 1123052400) AND ("timestamp" <= 1123138800)) -> Index Scan using logdata_recent_1123139634_ix_t_fld_gtl, logdata_recent_1123139634_ix_t_fld_gtl, logdata_recent_1123139634_ix_t_fld_gtlon logdata_recent_active logdata_recent (cost=0.00..18.08 rows=2 width=14) (actualtime=0.178..0.178 rows=0 loops=1) Index Cond: ((("timestamp" >= 1123052400) AND ("timestamp" <= 1123138800) AND (logfielddatatype = 70)AND (graphtargetlog = 1327)) OR (("timestamp" >= 1123052400) AND ("timestamp" <= 1123138800) AND (logfielddatatype = 71)AND (graphtargetlog = 1327)) OR (("timestamp" >= 1123052400) AND ("timestamp" <= 1123138800) AND (logfielddatatype = 69)AND (graphtargetlog = 1327))) Filter: (((logfielddatatype = 70) OR (logfielddatatype = 71) OR (logfielddatatype = 69)) AND (graphtargetlog= 1327) AND ("timestamp" >= 1123052400) AND ("timestamp" <= 1123138800)) Total runtime: 60214.545 ms (13 rows) 60 seconds is much longer than it used to be. I would guess it used to be under 10 seconds. The second time the above queryis run we see the magic of caching as the time goes down to 6 seconds. logdata_recent_active and logdata_recent_stale are inherited tables of logdata_recent, which never has any data. (This ispseudo-partitioning in action!) So the very quick seq_scan on the empty logdata_recent parent table is okay with me. The index is built on timestamp, logfielddatatype, graphtargetlog. I am curious as to why the same index shows up 3 timesin the "using" clause, but can live without knowing the details as long as it doesn't indicate that something's wrong. The logdata_recent_stale table has 5 millions rows. The size of the table itself, on disk, is 324MB. The size of the indexis 210MB. The disks are ext3 with journalling type of ordered, but this was later changed to writeback with no apparent change in speed. They're on a Dell poweredge 6650 with LSI raid card, setup as follows: 4 disks raid 10 for indexes (145GB) - sdc1 6 disks raid 10 for data (220GB) - sdd1 2 mirrored disks for logs - sdb1 stripe size is 32k cache policy: cached io (am told the controller has bbu) write policy: write-back read policy: readahead The partition names do what they say ... [root@rage-db2 /dbdata01]$ df Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdb1 70430588 729324 66123592 2% /dblog01 /dev/sdc1 140861236 19472588 114233300 15% /dbindex01 /dev/sdd1 211299960 157159988 43406548 79% /dbdata01 ... Using iostat (the version from http://linux.inet.hr/) I saw at one point that the data disk was 100% busy. I believe this was when running the above query, or similar, but in any case the system is always busy with both reads and(usually) writes. device mgr/s mgw/s r/s w/s kr/s kw/s size queue wait svc_t %b sdb1 0 61 0.1 15.5 0.4 305.7 19.6 0.1 5.8 4.9 8 sdc1 21 22 20.6 17.7 164.6 158.6 8.4 1.1 28.3 6.2 24 sdd1 1742 11 1904.7 6.6 14585.6 71.5 7.7 20.6 10.8 0.5 100 Another time, when I was running the query above, the index partition went to 90+% busy for 40 seconds: device mgr/s mgw/s r/s w/s kr/s kw/s size queue wait svc_t %b sdb1 0 0 0.0 0.2 0.0 0.8 4.0 0.0 20.0 15.0 0 sdc1 366 53 687.0 66.1 4213.1 483.0 6.2 11.8 15.7 1.3 96 sdd1 8 17 16.6 13.9 99.5 125.4 7.4 0.7 23.0 1.9 6 On another occasion (when the query took 24 seconds) I ran vmstat and iostat every 5 seconds from just before the query until just after. About the first two outputs are before the query. In this case the index disk is maxed. [root@rage-db2 ~]$ vmstat 5 16 procs memory swap io system cpu r b swpd free buff cache si so bi bo in cs us sy wa id 0 0 92 1233500 225692 9578564 0 0 0 0 1 1 2 2 1 1 0 1 92 1218460 225748 9595136 0 0 3322 18 655 898 0 0 20 79 0 1 92 1202124 225780 9616140 0 0 4204 58 920 1291 0 1 24 76 0 1 92 1172876 225820 9645348 0 0 5847 120 1053 1482 0 1 23 76 1 0 92 1151712 225836 9666504 0 0 4234 7 847 1239 2 1 18 78 1 0 92 1140860 225844 9677436 0 0 2153 500 575 2027 13 2 11 73 1 0 92 1140852 225848 9677636 0 0 0 506 213 442 10 1 0 89 [root@rage-db2 ~]$ /usr/local/bin/iostat -Px 5 16 device mgr/s mgw/s r/s w/s kr/s kw/s size queue wait svc_t %b sdb1 1 243 0.1 105.5 2.7 37.6 0.4 0.0 0.2 0.1 1 sdc1 6 111 3.7 75.9 38.3 769.3 10.1 0.0 0.3 0.2 1 sdd1 255 107 85.2 37.6 4.9 581.0 4.8 0.0 0.1 0.0 1 device mgr/s mgw/s r/s w/s kr/s kw/s size queue wait svc_t %b sdb1 0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 sdc1 273 0 414.0 0.4 2747.7 2.4 6.6 1.6 3.9 1.7 69 sdd1 0 1 1.4 0.4 7.2 5.6 7.1 0.0 10.0 6.7 1 device mgr/s mgw/s r/s w/s kr/s kw/s size queue wait svc_t %b sdb1 0 0 0.0 0.4 0.0 1.6 4.0 0.0 10.0 5.0 0 sdc1 225 4 777.1 4.6 4011.0 35.1 5.2 2.5 3.2 1.3 99 sdd1 0 2 0.0 2.6 0.0 16.8 6.5 0.0 8.5 0.8 0 device mgr/s mgw/s r/s w/s kr/s kw/s size queue wait svc_t %b sdb1 0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 sdc1 508 7 917.8 7.4 5703.0 58.3 6.2 2.2 2.4 1.1 98 sdd1 0 4 0.0 6.8 0.0 44.7 6.6 0.1 15.6 0.6 0 device mgr/s mgw/s r/s w/s kr/s kw/s size queue wait svc_t %b sdb1 0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 sdc1 361 0 737.5 0.4 4391.7 2.4 6.0 1.8 2.4 1.0 76 sdd1 0 0 0.0 0.4 0.0 2.4 6.0 0.0 0.0 0.0 0 device mgr/s mgw/s r/s w/s kr/s kw/s size queue wait svc_t %b sdb1 0 87 0.0 17.8 0.0 418.3 23.6 0.0 1.3 1.2 2 sdc1 216 2 489.5 0.4 2821.7 11.2 5.8 1.2 2.4 1.1 56 sdd1 2 4 7.2 0.6 37.5 18.4 7.2 0.0 6.2 3.3 3 device mgr/s mgw/s r/s w/s kr/s kw/s size queue wait svc_t %b sdb1 0 89 0.0 22.4 0.0 446.3 20.0 0.0 1.1 0.8 2 sdc1 0 4 0.0 1.0 0.0 18.4 18.4 0.0 0.0 0.0 0 sdd1 0 6 0.0 1.0 0.0 27.1 27.2 0.0 0.0 0.0 0 device mgr/s mgw/s r/s w/s kr/s kw/s size queue wait svc_t %b sdb1 0 89 0.0 22.5 0.0 446.2 19.8 0.0 0.4 0.3 1 sdc1 0 2 0.0 0.4 0.0 9.6 24.0 0.0 0.0 0.0 0 sdd1 0 4 0.0 0.6 0.0 20.0 33.3 0.0 0.0 0.0 0 Finally, here's a show all: add_missing_from | on archive_command | unset australian_timezones | off authentication_timeout | 60 bgwriter_delay | 200 bgwriter_maxpages | 100 bgwriter_percent | 1 block_size | 8192 check_function_bodies | on checkpoint_segments | 20 checkpoint_timeout | 300 checkpoint_warning | 30 client_encoding | UNICODE client_min_messages | notice commit_delay | 350 commit_siblings | 5 config_file | /dbdata01/pgdata/postgresql.conf cpu_index_tuple_cost | 0.001 cpu_operator_cost | 0.0025 cpu_tuple_cost | 0.01 custom_variable_classes | unset data_directory | /dbdata01/pgdata DateStyle | ISO, MDY db_user_namespace | off deadlock_timeout | 1000 debug_pretty_print | off debug_print_parse | off debug_print_plan | off debug_print_rewritten | off debug_shared_buffers | 0 default_statistics_target | 50 default_tablespace | unset default_transaction_isolation | read committed default_transaction_read_only | off default_with_oids | off dynamic_library_path | $libdir effective_cache_size | 48000 enable_hashagg | on enable_hashjoin | on enable_indexscan | on enable_mergejoin | on enable_nestloop | on enable_seqscan | on enable_sort | on enable_tidscan | on explain_pretty_print | on external_pid_file | unset extra_float_digits | 0 from_collapse_limit | 8 fsync | on geqo | on geqo_effort | 5 geqo_generations | 0 geqo_pool_size | 0 geqo_selection_bias | 2 geqo_threshold | 12 hba_file | /dbdata01/pgdata/pg_hba.conf ident_file | /dbdata01/pgdata/pg_ident.conf integer_datetimes | off join_collapse_limit | 8 krb_server_keyfile | unset lc_collate | en_US.UTF-8 lc_ctype | en_US.UTF-8 lc_messages | en_US.UTF-8 lc_monetary | en_US.UTF-8 lc_numeric | en_US.UTF-8 lc_time | en_US.UTF-8 listen_addresses | * log_connections | off log_destination | stderr log_directory | /dblog01 log_disconnections | off log_duration | off log_error_verbosity | default log_executor_stats | off log_filename | postgresql-%Y-%m-%d_%H%M%S.log log_hostname | off log_line_prefix | unset log_min_duration_statement | -1 log_min_error_statement | panic log_min_messages | notice log_parser_stats | off log_planner_stats | off log_rotation_age | 1440 log_rotation_size | 10240 log_statement | none log_statement_stats | off log_truncate_on_rotation | off maintenance_work_mem | 262144 max_connections | 40 max_files_per_process | 1000 max_fsm_pages | 100000 max_fsm_relations | 1000 max_function_args | 32 max_identifier_length | 63 max_index_keys | 32 max_locks_per_transaction | 64 max_stack_depth | 2048 password_encryption | on port | 5432 pre_auth_delay | 0 preload_libraries | unset random_page_cost | 4 redirect_stderr | on regex_flavor | advanced rendezvous_name | unset search_path | $user,public server_encoding | UNICODE server_version | 8.0.3 shared_buffers | 10240 silent_mode | off sql_inheritance | on ssl | off statement_timeout | 0 stats_block_level | off stats_command_string | off stats_reset_on_server_start | on stats_row_level | on stats_start_collector | on superuser_reserved_connections | 2 syslog_facility | LOCAL0 syslog_ident | postgres TimeZone | PST8PDT trace_notify | off transaction_isolation | read committed transaction_read_only | off transform_null_equals | off unix_socket_directory | unset unix_socket_group | unset unix_socket_permissions | 511 vacuum_cost_delay | 180 vacuum_cost_limit | 200 vacuum_cost_page_dirty | 20 vacuum_cost_page_hit | 1 vacuum_cost_page_miss | 10 wal_buffers | 8 wal_sync_method | fdatasync work_mem | 98304 zero_damaged_pages | off
pgsql-performance by date: