random_page_cost vs seq_page_cost - Mailing list pgsql-hackers
From | Benedikt Grundmann |
---|---|
Subject | random_page_cost vs seq_page_cost |
Date | |
Msg-id | 20120105100405.GJ15490@ldn-qws-004.delacy.com Whole thread Raw |
Responses |
Re: random_page_cost vs seq_page_cost
Re: random_page_cost vs seq_page_cost Re: random_page_cost vs seq_page_cost Re: random_page_cost vs seq_page_cost Re: random_page_cost vs seq_page_cost |
List | pgsql-hackers |
Hello list, I have a question of how to benchmark hardware to determine the appropriate ratio of seq_page_cost vs random_page_cost. Emails in this mailing lists archive seem to indicate that 1.0 vs 3.0 - 4.0 are appropriate values on modern hardware. Which surprised me a bit as I had thought that on actual harddrives (ignoring SSDs) random_page_cost is higher. I guess that the number tries to reflect caching of the relevant pages in memory and modern hardware you have more of that? Anyhow it would be great to have a scientific way of setting those numbers. Background: We have recently upgrade two of our biggest postgres databases to new hardware and minor version number bump (8.4.5 -> 8.4.9). We are experiencing a big performance regression in some queries. In those cases the planner seems to choose a nested loop index scan instead of hashing the index once and then joining. The new hardware was optimized for seq scans and does those very fast. We are not sure if the database used to choose differently before the move to the new hardware and the hardware is performing worse for random seeks. Or if the planner is now making different choices. As a counter measure we are experimenting with enable_nestloop = off random_page_cost = 20 (instead of the previous 4). It is worth noting that for many small tables the nestloop is indeed marginally faster (in the doesn't really matter because both cases are fast enough case). But the regression for the big tables (big in the sense of index just fits into memory but in practice might not because there other frequently accessed big tables) is a show stopper. For some of those tables we have also have recently (as part of the move) clustered for the first time in ages and it was speculated that that might have changed statistics (such as correlation) and increased the attractiveness of the index scan to the planner. Another thing that I have thought before might be provide some enlightenment would be a explain log select ... That would show all the sub plans considered and why they were discarded or something approximating this. Thanks in advance for any reply and sorry that this email turned out to be rather long stream of consciousness dump. Bene ------ relevant parts of our postgresql.conf ----------- shared_buffers = 12GB # min 128kB # (change requires restart) temp_buffers = 512MB # min 800kB #max_prepared_transactions = 0 # zero disables the feature # (change requires restart) # Note: Increasing max_prepared_transactions costs ~600 bytes of shared memory # per transaction slot, plus lock space (see max_locks_per_transaction). # It is not advisable to set max_prepared_transactions nonzero unless you # actively intend to use prepared transactions. work_mem = 192MB # min 64kB maintenance_work_mem = 1GB # min 1MB #max_stack_depth = 2MB # min 100kB # - Kernel Resource Usage - #max_files_per_process = 1000 # min 25 # (change requires restart) #shared_preload_libraries = '' # (change requires restart) # - Cost-Based Vacuum Delay - vacuum_cost_delay = 100ms # 0-100 milliseconds vacuum_cost_page_hit = 1 # 0-10000 credits vacuum_cost_page_miss = 10 # 0-10000 credits vacuum_cost_page_dirty = 20 # 0-10000 credits vacuum_cost_limit = 7500 # 1-10000 credits # - Background Writer - #bgwriter_delay = 200ms # 10-10000ms between rounds #bgwriter_lru_maxpages = 100 # 0-1000 max buffers written/round #bgwriter_lru_multiplier = 2.0 # 0-10.0 multipler on buffers scanned/round # - Asynchronous Behavior - effective_io_concurrency = 40 # 1-1000. 0 disables prefetching # WRITE AHEAD LOG fsync = on # turns forced synchronization on or off synchronous_commit = off # immediate fsync at commit #wal_sync_method = fsync # the default is the first option # supported by the operatingsystem: # open_datasync # fdatasync # fsync # fsync_writethrough # open_sync full_page_writes = on # recover from partial page writes wal_buffers = 16MB # min 32kB # (change requires restart) #wal_writer_delay = 200ms # 1-10000 milliseconds commit_delay = 1000 # range 0-100000, in microseconds commit_siblings = 5 # range 1-1000 # - Checkpoints - checkpoint_segments = 128 # in logfile segments, min 1, 16MB each checkpoint_timeout = 10min # range 30s-1h checkpoint_completion_target = 0.9 # checkpoint target duration, 0.0 - 1.0 checkpoint_warning = 30s # 0 disables # - Archiving - archive_mode = off # allows archiving to be done # QUERY TUNING # - Planner Method Configuration - #enable_bitmapscan = on #enable_hashagg = on #enable_hashjoin = on #enable_indexscan = on #enable_mergejoin = on #enable_nestloop = on #enable_seqscan = on #enable_sort = on #enable_tidscan = on # - Planner Cost Constants - seq_page_cost = 1.0 # measured on an arbitrary scale random_page_cost = 20.0 # same scale as above #cpu_tuple_cost = 0.01 # same scale as above #cpu_index_tuple_cost = 0.005 # same scale as above #cpu_operator_cost = 0.0025 # same scale as above effective_cache_size = 32GB # dpowers: set to 2/3 of available ram # - Genetic Query Optimizer - #geqo = on #geqo_threshold = 12 #geqo_effort = 5 # range 1-10 #geqo_pool_size = 0 # selects default based on effort #geqo_generations = 0 # selects default based on effort #geqo_selection_bias = 2.0 # range 1.5-2.0 # - Other Planner Options - default_statistics_target = 1000 # range 1-10000 #constraint_exclusion = partition # on, off, or partition cursor_tuple_fraction = 1.0 # range 0.0-1.0 #from_collapse_limit = 8 #join_collapse_limit = 8 # 1 disables collapsing of explicit # JOIN clauses Machine: OS: linux 2.6.32-71.29.1 CPU 12 x Intel(R) Xeon(R) CPU X5680 @ 3.33GHz Cache size: 12288 KB RAM: 47.12 GB
pgsql-hackers by date: