Thread: Inconsistent performance
To whoever can assist, I am working with a decent sized database on an extremely powerful machine. The specs follow: OS: RedHat Linux 9.0 PG Version 7.3 Memory 1 gig CPU Quad Processor - Unsure of exact CPUs Hard Drive 80 gigs Database Size 2 gigs As you can see the server is built for overkill. The problem that I see is as follows. I do a rather simple query: select count (*) from large-table where column = some value; About 80% of the time, the response time is sub-second. However, at 10% of the time, the response time is 5 - 10 seconds. This is nothing readily apparent at the system level that comes close to explaining the performance hits. CPU and memory usage (as measured by top) appear to be fine. Although there are certain tuning issues within the database itself, no documentation I have seen seems to indicate that tuning issues would lead to such inconsistent response time. Any ideas? Regards, Joseph
On Mon, 15 Sep 2003, Joseph Bove wrote: > I am working with a decent sized database on an extremely powerful machine. > The specs follow: > > OS: RedHat Linux 9.0 > PG Version 7.3 > Memory 1 gig > CPU Quad Processor - Unsure of exact CPUs > Hard Drive 80 gigs > Database Size 2 gigs > > > As you can see the server is built for overkill. > > The problem that I see is as follows. > > I do a rather simple query: select count (*) from large-table where column > = some value; > > About 80% of the time, the response time is sub-second. However, at 10% of > the time, the response time is 5 - 10 seconds. Is it consistant for various values of "some value"? If so, it's possible that it's switching plans based on the apparent selectivity of the column for that value.
On Mon, Sep 15, 2003 at 17:34:12 -0400, Joseph Bove <jbove@vetstar.com> wrote: > > I do a rather simple query: select count (*) from large-table where column > = some value; > > About 80% of the time, the response time is sub-second. However, at 10% of > the time, the response time is 5 - 10 seconds. > > This is nothing readily apparent at the system level that comes close to > explaining the performance hits. CPU and memory usage (as measured by top) > appear to be fine. > > Although there are certain tuning issues within the database itself, no > documentation I have seen seems to indicate that tuning issues would lead > to such inconsistent response time. Looking at the output from explain analyze for the query would be useful. It may be that there are a lot of rows that have the value in the problem queries.
Stephan, Actually, it's inconsistent with the exact same command. I've now replicated the problem by doing the following command: select count (*) from table; The table in question has 88899 rows. The response time is anywhere from 1 second to 12 seconds. Different response times can occur in the same minute of testing! Regards, Joseph At 02:34 PM 9/15/2003 -0700, you wrote: >On Mon, 15 Sep 2003, Joseph Bove wrote: > > > I am working with a decent sized database on an extremely powerful machine. > > The specs follow: > > > > OS: RedHat Linux 9.0 > > PG Version 7.3 > > Memory 1 gig > > CPU Quad Processor - Unsure of exact CPUs > > Hard Drive 80 gigs > > Database Size 2 gigs > > > > > > As you can see the server is built for overkill. > > > > The problem that I see is as follows. > > > > I do a rather simple query: select count (*) from large-table where column > > = some value; > > > > About 80% of the time, the response time is sub-second. However, at 10% of > > the time, the response time is 5 - 10 seconds. > >Is it consistant for various values of "some value"? If so, it's possible >that it's switching plans based on the apparent selectivity of the column >for that value.
Joseph, Please see this web page before posting anything else: http://techdocs.postgresql.org/guides/SlowQueryPostingGuidelines Currently, you are not posting enough data for anyone to be of meaningful help. -- -Josh Berkus Aglio Database Solutions San Francisco
On Mon, 15 Sep 2003, Joseph Bove wrote: > Stephan, > > Actually, it's inconsistent with the exact same command. I've now > replicated the problem by doing the following command: > > select count (*) from table; > > The table in question has 88899 rows. > > The response time is anywhere from 1 second to 12 seconds. Different > response times can occur in the same minute of testing! Well, that's really only got one valid plan right now (seqscan and aggregate). It'd be mildly interesting to see what explain analyze says in slow and fast states, although I'd be willing to bet that it's just going to effectively show that the seqscan is taking more or less time. I think we're going to need to see the configuration settings for the server and possibly some info on how big the table is (say relpages for the pg_class row associated with the table after a vacuum full).
Stephan, I've run explain analyze a number of times and have gotten results between 5.5 and 7.5 seconds Attached is a typical output QUERY PLAN ------------------------------------- Aggregate (cost=9993.92..9993.92 rows=1 width=0) (actual time=7575.59..7575.59 rows=1 loops=1) -> Seq Scan on vetapview (cost=0.00..9771.34 rows=89034 width=0) (actual time=0.06..7472.20 rows=88910 loops=1) Total runtime: 7575.67 msec (3 rows) The only things changing are the actual time. The costs are constant. The relpages from pg_class for vetapview (the table in question) is 8881. At the end of this message is the exhaustive contents of postgresql.conf. The only settings I have attempted tuning are as follows: tcpip_socket = true max_connections = 100 shared_buffers = 5000 sort_mem = 8192 fsync = false I did have shared_buffers and sort_mem both set higher originally (15000, 32168) but decreased them in case over-utilization of memory was the problem. The kernel setting shmmax is set to 256,000,000 (out of 1 gig) Regards, Joseph postgresql.conf # # Connection Parameters # tcpip_socket = true #ssl = false max_connections = 100 #superuser_reserved_connections = 2 #port = 5432 #hostname_lookup = false #show_source_port = false #unix_socket_directory = '' #unix_socket_group = '' #unix_socket_permissions = 0777 # octal #virtual_host = '' #krb_server_keyfile = '' # # Shared Memory Size # #shared_buffers = 15000 # min max_connections*2 or 16, 8KB each shared_buffers = 5000 #max_fsm_relations = 1000 # min 10, fsm is free space map, ~40 bytes #max_fsm_pages = 10000 # min 1000, fsm is free space map, ~6 bytes #max_locks_per_transaction = 64 # min 10 #wal_buffers = 8 # min 4, typically 8KB each # # Non-shared Memory Sizes # #sort_mem = 32168 # min 64, size in KB sort_mem = 8192 #vacuum_mem = 8192 # min 1024, size in KB # # Write-ahead log (WAL) # #checkpoint_segments = 3 # in logfile segments, min 1, 16MB each #checkpoint_timeout = 300 # range 30-3600, in seconds # #commit_delay = 0 # range 0-100000, in microseconds #commit_siblings = 5 # range 1-1000 # fsync = false #wal_sync_method = fsync # the default varies across platforms: # # fsync, fdatasync, open_sync, or open_datasync #wal_debug = 0 # range 0-16 # # Optimizer Parameters # #enable_seqscan = true #enable_indexscan = true #enable_tidscan = true #enable_sort = true#enable_tidscan = true #enable_sort = true #enable_nestloop = true #enable_mergejoin = true #enable_hashjoin = true #effective_cache_size = 1000 # typically 8KB each #random_page_cost = 4 # units are one sequential page fetch cost #cpu_tuple_cost = 0.01 # (same) #cpu_index_tuple_cost = 0.001 # (same) #cpu_operator_cost = 0.0025 # (same) #default_statistics_target = 10 # range 1-1000 # # GEQO Optimizer Parameters # #geqo = true #geqo_selection_bias = 2.0 # range 1.5-2.0 #geqo_threshold = 11 #geqo_pool_size = 0 # default based on tables in statement, # range 128-1024 #geqo_effort = 1 #geqo_generations = 0#geqo_random_seed = -1 # auto-compute seed # # Message display # #server_min_messages = notice # Values, in order of decreasing detail: # debug5, debug4, debug3, debug2, debug1, # info, notice, warning, error, log, fatal, # panic #client_min_messages = notice # Values, in order of decreasing detail: # debug5, debug4, debug3, debug2, debug1, # log, info, notice, warning, error #silent_mode = false #log_connections = false #log_pid = false #log_statement = false #log_duration = false #log_timestamp = false #log_min_error_statement = error # Values in order of increasing severity: #log_min_error_statement = error # Values in order of increasing severity: # debug5, debug4, debug3, debug2, debug1, # info, notice, warning, error, panic(off) #debug_print_parse = false #debug_print_rewritten = false #debug_print_plan = false #debug_pretty_print = false #explain_pretty_print = true # requires USE_ASSERT_CHECKING #debug_assertions = true # # Syslog # #syslog = 0 # range 0-2 #syslog_facility = 'LOCAL0' #syslog_ident = 'postgres' # # Statistics # #show_parser_stats = false #show_planner_stats = false #show_executor_stats = false #show_statement_stats = false # requires BTREE_BUILD_STATS #show_btree_build_stats = false # # Access statistics collection # #stats_start_collector = true #stats_reset_on_server_start = true #stats_command_string = false #stats_row_level = false #stats_block_level = false # # Lock Tracing # #trace_notify = false # requires LOCK_DEBUG #trace_locks = false #trace_userlocks = false #trace_lwlocks = false #debug_deadlocks = false #trace_lock_oidmin = 16384 #trace_lock_table = 0 # # Misc # #autocommit = true #dynamic_library_path = '$libdir' #search_path = '$user,public' #datestyle = 'iso, us' #timezone = unknown # actually, defaults to TZ environment setting #datestyle = 'iso, us' #timezone = unknown # actually, defaults to TZ environment setting #australian_timezones = false #client_encoding = sql_ascii # actually, defaults to database encoding #authentication_timeout = 60 # 1-600, in seconds #deadlock_timeout = 1000 # in milliseconds #default_transaction_isolation = 'read committed' #max_expr_depth = 10000 # min 10 #max_files_per_process = 1000 # min 25 #password_encryption = true #sql_inheritance = true #transform_null_equals = false #statement_timeout = 0 # 0 is disabled, in milliseconds #db_user_namespace = false # # Locale settings # # (initialized by initdb -- may be changed) LC_MESSAGES = 'en_US.UTF-8' LC_MONETARY = 'en_US.UTF-8' LC_NUMERIC = 'en_US.UTF-8' LC_TIME = 'en_US.UTF-8' At 03:49 PM 9/15/2003 -0700, Stephan Szabo wrote: >On Mon, 15 Sep 2003, Joseph Bove wrote: > > > Stephan, > > > > Actually, it's inconsistent with the exact same command. I've now > > replicated the problem by doing the following command: > > > > select count (*) from table; > > > > The table in question has 88899 rows. > > > > The response time is anywhere from 1 second to 12 seconds. Different > > response times can occur in the same minute of testing! > >Well, that's really only got one valid plan right now (seqscan and >aggregate). It'd be mildly interesting to see what explain analyze says in >slow and fast states, although I'd be willing to bet that it's just going >to effectively show that the seqscan is taking more or less time. > >I think we're going to need to see the configuration settings for the >server and possibly some info on how big the table is (say relpages for >the pg_class row associated with the table after a vacuum full). > >---------------------------(end of broadcast)--------------------------- >TIP 6: Have you searched our list archives? > > http://archives.postgresql.org
jbove@vetstar.com (Joseph Bove) writes: > I do a rather simple query: select count (*) from large-table where > column = some value; > > About 80% of the time, the response time is sub-second. However, at > 10% of the time, the response time is 5 - 10 seconds. Does it seem data-dependent? That is, does the time vary for different values of "some value?" If a particular value is particularly common, the system might well revert to a sequential scan, making the assumption that it is quicker to look at every page in the table rather than to walk through Enormous Numbers of records. I had a case very similar to this where a table had _incredible_ skewing of this sort where there were a small number of column values that occurred hundreds of thousands of times, and other column values only occurred a handful of times. I was able to get Excellent Performance back by setting up two partial indices: - One for WHERE THIS_COLUMN > VITAL_VALUE; - One for WHERE THIS_COLUMN < VITAL_VALUE; The REALLY COMMON values were in the range < VITAL_VALUE. -- output = ("cbbrowne" "@" "libertyrms.info") <http://dev6.int.libertyrms.com/> Christopher Browne (416) 646 3304 x124 (land)
On Mon, 15 Sep 2003, Joseph Bove wrote: > Stephan, > > I've run explain analyze a number of times and have gotten results between > 5.5 and 7.5 seconds > > Attached is a typical output > > QUERY PLAN > ------------------------------------- > Aggregate (cost=9993.92..9993.92 rows=1 width=0) > (actual time=7575.59..7575.59 rows=1 loops=1) > -> Seq Scan on vetapview (cost=0.00..9771.34 rows=89034 width=0) > (actual time=0.06..7472.20 > rows=88910 loops=1) > Total runtime: 7575.67 msec > (3 rows) > > The only things changing are the actual time. The costs are constant. > > The relpages from pg_class for vetapview (the table in question) is 8881. > > At the end of this message is the exhaustive contents of postgresql.conf. > The only settings I have attempted tuning are as follows: > > tcpip_socket = true > max_connections = 100 > shared_buffers = 5000 > sort_mem = 8192 > fsync = false A couple of things. 1: Is there an index on the parts of the query used for the where clause? 2: What is your effect_cache_size set to? It needs to be set right for your postgresql server to be able to take advantage of the kernel's cache (i.e. use an index scan when the kernel is likely to have that data in memory.)
it seems like the difference is probably related to caching. you say you have 1gb of ram, and the database is 2gb. Obviously the entire database isn't cached, but maybe your query runs fast when the table is in memory, and they it gets swapped out of cache because some other piece of information moves into memory. In that circumstance, it has to load the information from disk and is therefor slow. how busy is the system? what other programs are running on the machine? how big (on disk) is the table in question? what kind of load does the system have? is it a single 80gb ide drive? Even though you have 4 CPU's a small amount of memory and bad IO system will kill the database. On Monday, September 15, 2003, at 05:28 PM, Joseph Bove wrote: > Stephan, > > I've run explain analyze a number of times and have gotten results > between 5.5 and 7.5 seconds > > Attached is a typical output > > QUERY PLAN > ------------------------------------- > Aggregate (cost=9993.92..9993.92 rows=1 width=0) > (actual time=7575.59..7575.59 rows=1 loops=1) > -> Seq Scan on vetapview (cost=0.00..9771.34 rows=89034 width=0) > (actual time=0.06..7472.20 > rows=88910 loops=1) > Total runtime: 7575.67 msec > (3 rows) > > The only things changing are the actual time. The costs are constant. > > The relpages from pg_class for vetapview (the table in question) is > 8881. > > At the end of this message is the exhaustive contents of > postgresql.conf. The only settings I have attempted tuning are as > follows: > > tcpip_socket = true > max_connections = 100 > shared_buffers = 5000 > sort_mem = 8192 > fsync = false > > I did have shared_buffers and sort_mem both set higher originally > (15000, 32168) but decreased them in case over-utilization of memory > was the problem. > > The kernel setting shmmax is set to 256,000,000 (out of 1 gig) > > Regards, > > Joseph > > postgresql.conf > > # > # Connection Parameters > # > tcpip_socket = true > #ssl = false > > max_connections = 100 > #superuser_reserved_connections = 2 > > #port = 5432 > #hostname_lookup = false > #show_source_port = false > > #unix_socket_directory = '' > #unix_socket_group = '' > #unix_socket_permissions = 0777 # octal > > #virtual_host = '' > > #krb_server_keyfile = '' > > > # > # Shared Memory Size > # > #shared_buffers = 15000 # min max_connections*2 or 16, 8KB each > shared_buffers = 5000 > #max_fsm_relations = 1000 # min 10, fsm is free space map, ~40 > bytes > #max_fsm_pages = 10000 # min 1000, fsm is free space map, ~6 > bytes > #max_locks_per_transaction = 64 # min 10 > #wal_buffers = 8 # min 4, typically 8KB each > > # > # Non-shared Memory Sizes > # > #sort_mem = 32168 # min 64, size in KB > sort_mem = 8192 > #vacuum_mem = 8192 # min 1024, size in KB > # > # Write-ahead log (WAL) > # > #checkpoint_segments = 3 # in logfile segments, min 1, 16MB each > #checkpoint_timeout = 300 # range 30-3600, in seconds > # > #commit_delay = 0 # range 0-100000, in microseconds > #commit_siblings = 5 # range 1-1000 > # > fsync = false > #wal_sync_method = fsync # the default varies across platforms: > # # fsync, fdatasync, open_sync, or > open_datasync > #wal_debug = 0 # range 0-16 > > > # > # Optimizer Parameters > # > #enable_seqscan = true > #enable_indexscan = true > #enable_tidscan = true > #enable_sort = true#enable_tidscan = true > #enable_sort = true > #enable_nestloop = true > #enable_mergejoin = true > #enable_hashjoin = true > > #effective_cache_size = 1000 # typically 8KB each > #random_page_cost = 4 # units are one sequential page fetch > cost > #cpu_tuple_cost = 0.01 # (same) > #cpu_index_tuple_cost = 0.001 # (same) > #cpu_operator_cost = 0.0025 # (same) > > #default_statistics_target = 10 # range 1-1000 > > # > # GEQO Optimizer Parameters > # > #geqo = true > #geqo_selection_bias = 2.0 # range 1.5-2.0 > #geqo_threshold = 11 > #geqo_pool_size = 0 # default based on tables in statement, > # range 128-1024 > #geqo_effort = 1 > #geqo_generations = 0#geqo_random_seed = -1 # auto-compute > seed > > > # > # Message display > # > #server_min_messages = notice # Values, in order of decreasing > detail: > # debug5, debug4, debug3, debug2, > debug1, > # info, notice, warning, error, log, > fatal, > # panic > #client_min_messages = notice # Values, in order of decreasing > detail: > # debug5, debug4, debug3, debug2, > debug1, > # log, info, notice, warning, error > #silent_mode = false > > #log_connections = false > #log_pid = false > #log_statement = false > #log_duration = false > #log_timestamp = false > > #log_min_error_statement = error # Values in order of increasing > severity: > > #log_min_error_statement = error # Values in order of increasing > severity: > # debug5, debug4, debug3, debug2, > debug1, > # info, notice, warning, error, > panic(off) > > #debug_print_parse = false > #debug_print_rewritten = false > #debug_print_plan = false > #debug_pretty_print = false > > #explain_pretty_print = true > > # requires USE_ASSERT_CHECKING > #debug_assertions = true > > > # > # Syslog > # > #syslog = 0 # range 0-2 > #syslog_facility = 'LOCAL0' > #syslog_ident = 'postgres' > > # > # Statistics > # > #show_parser_stats = false > #show_planner_stats = false > #show_executor_stats = false > #show_statement_stats = false > > # requires BTREE_BUILD_STATS > #show_btree_build_stats = false > > > # > # Access statistics collection > # > #stats_start_collector = true > #stats_reset_on_server_start = true > #stats_command_string = false > #stats_row_level = false > #stats_block_level = false > > # > # Lock Tracing > # > #trace_notify = false > > # requires LOCK_DEBUG > #trace_locks = false > #trace_userlocks = false > #trace_lwlocks = false > #debug_deadlocks = false > #trace_lock_oidmin = 16384 > #trace_lock_table = 0 > > > # > # Misc > # > #autocommit = true > #dynamic_library_path = '$libdir' > #search_path = '$user,public' > #datestyle = 'iso, us' > #timezone = unknown # actually, defaults to TZ environment > setting > #datestyle = 'iso, us' > #timezone = unknown # actually, defaults to TZ environment > setting > #australian_timezones = false > #client_encoding = sql_ascii # actually, defaults to database > encoding > #authentication_timeout = 60 # 1-600, in seconds > #deadlock_timeout = 1000 # in milliseconds > #default_transaction_isolation = 'read committed' > #max_expr_depth = 10000 # min 10 > #max_files_per_process = 1000 # min 25 > #password_encryption = true > #sql_inheritance = true > #transform_null_equals = false > #statement_timeout = 0 # 0 is disabled, in milliseconds > #db_user_namespace = false > > > > # > # Locale settings > # > # (initialized by initdb -- may be changed) > LC_MESSAGES = 'en_US.UTF-8' > LC_MONETARY = 'en_US.UTF-8' > LC_NUMERIC = 'en_US.UTF-8' > LC_TIME = 'en_US.UTF-8' > > At 03:49 PM 9/15/2003 -0700, Stephan Szabo wrote: > >> On Mon, 15 Sep 2003, Joseph Bove wrote: >> >> > Stephan, >> > >> > Actually, it's inconsistent with the exact same command. I've now >> > replicated the problem by doing the following command: >> > >> > select count (*) from table; >> > >> > The table in question has 88899 rows. >> > >> > The response time is anywhere from 1 second to 12 seconds. Different >> > response times can occur in the same minute of testing! >> >> Well, that's really only got one valid plan right now (seqscan and >> aggregate). It'd be mildly interesting to see what explain analyze >> says in >> slow and fast states, although I'd be willing to bet that it's just >> going >> to effectively show that the seqscan is taking more or less time. >> >> I think we're going to need to see the configuration settings for the >> server and possibly some info on how big the table is (say relpages >> for >> the pg_class row associated with the table after a vacuum full). >> >> ---------------------------(end of >> broadcast)--------------------------- >> TIP 6: Have you searched our list archives? >> >> http://archives.postgresql.org > > > > ---------------------------(end of > broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster
On Mon, 15 Sep 2003, scott.marlowe wrote: > On Mon, 15 Sep 2003, Joseph Bove wrote: > > > Stephan, > > > > I've run explain analyze a number of times and have gotten results between > > 5.5 and 7.5 seconds > > > > Attached is a typical output > > > > QUERY PLAN > > ------------------------------------- > > Aggregate (cost=9993.92..9993.92 rows=1 width=0) > > (actual time=7575.59..7575.59 rows=1 loops=1) > > -> Seq Scan on vetapview (cost=0.00..9771.34 rows=89034 width=0) > > (actual time=0.06..7472.20 > > rows=88910 loops=1) > > Total runtime: 7575.67 msec > > (3 rows) > > > > The only things changing are the actual time. The costs are constant. > > > > The relpages from pg_class for vetapview (the table in question) is 8881. > > > > At the end of this message is the exhaustive contents of postgresql.conf. > > The only settings I have attempted tuning are as follows: > > > > tcpip_socket = true > > max_connections = 100 > > shared_buffers = 5000 > > sort_mem = 8192 > > fsync = false > > A couple of things. > > 1: Is there an index on the parts of the query used for the where clause? > 2: What is your effect_cache_size set to? It needs to be set right for > your postgresql server to be able to take advantage of the kernel's cache > (i.e. use an index scan when the kernel is likely to have that data in > memory.) Sorry, that should be effective_cache_size, not effect_cache_size. It's set in 8k blocks and is usually about how much buffer / cache you have left over after the machines "settles" after being up and running for a while. Fer instance, on my server, I show 784992K cache, and 42976K buff under top, so, that's 827968k/8k=103496 blocks. Note that if you've recompiled you may have somehow set block size larger, but installations with postgresql block sizes ~=8k are pretty uncommon, and you'd know if you had done that, so it's probably 8k blocks.
The world rejoiced as jbove@vetstar.com (Joseph Bove) wrote: > Actually, it's inconsistent with the exact same command. I've now > replicated the problem by doing the following command: > > select count (*) from table; > > The table in question has 88899 rows. > > The response time is anywhere from 1 second to 12 seconds. Different > response times can occur in the same minute of testing! The only possible plan for THAT query will involve a seq scan of the whole table. If the postmaster already has the data in cache, it makes sense for it to run in 1 second. If it has to read it from disk, 12 seconds makes a lot of sense. You might want to increase the "shared_buffers" parameter in postgresql.conf; that should lead to increased stability of times as it should be more likely that the data in "table" will remain in cache. -- (reverse (concatenate 'string "gro.gultn" "@" "enworbbc")) http://www3.sympatico.ca/cbbrowne/x.html Signs of a Klingon Programmer - 8. "Debugging? Klingons do not debug. Our software does not coddle the weak. Bugs are good for building character in the user."
On Mon, 15 Sep 2003 22:26:45 -0400, Christopher Browne <cbbrowne@acm.org> wrote: >> select count (*) from table; >The only possible plan for THAT query will involve a seq scan of the >whole table. If the postmaster already has the data in cache, it >makes sense for it to run in 1 second. If it has to read it from >disk, 12 seconds makes a lot of sense. Yes. And note that the main difference is between having the data in memory and having to fetch it from disk. I don't believe that this difference can be explained by 9000 read calls hitting the operating system's cache. >You might want to increase the "shared_buffers" parameter in >postgresql.conf; that should lead to increased stability of times as >it should be more likely that the data in "table" will remain in >cache. Let's not jump to this conclusion before we know what's going on. Joseph Bove <jbove@vetstar.com> wrote in another message above: | I did have shared_buffers and sort_mem both set higher originally (15000, | 32168) As I read this I think he meant "... and had the same performance problem." Joseph, what do you get, if you run that EXPLAIN ANALYSE SELECT count(*) ... several times? What do vmstat and top show while the query is running? Are there other processes active during or between the runs? What kind of processes? Postgres backends? Web server? ... Servus Manfred
... > #effective_cache_size = 1000 # typically 8KB each That's horribly wrong. It's telling PG that your OS is only likely to cache 8MB of the DB in RAM. If you've got 1GB of memory it should be between 64000 and 96000
Dear list, First and foremost, thanks to the great number of people who have responded with various tips and suggestions. I am now starting to fully appreciate the various cache settings and what they can do for performance. I just want to redefine the problem based on the knowledge of it that I now have. In my example, I am purposefully forcing a full table scan - select count (*) from table. This table has only 90,000 rows. Each row is comprised of about 300 bytes of data. If the table has not been cached, I was seeing response times from 5 to 18 seconds to read the table. If it had been cached, then the response time dropped to sub-second response. Obviously, I can tune the caching so as to make sure that as much data that can be reasonably cached is cached. However, I don't think that a hit of even 5 seconds to read a table of 90,000 rows is acceptable. One thing that has been tried with some success was to dump the table and recreate it. After this exercise, selecting all rows from the table when it is not in cache takes about 3 seconds. (Of course, when in cache, the same sub-second response time is seen.) I still think that 3 seconds is not acceptable. However, I reserve the right to be wrong. Does it sound unrealistic to expect PostgreSQL to be able to read 90,000 rows with 300 bytes per row in under a second? Based on suggestions from the list, I am also thinking of making the following tuning changes: shared_buffers = 15000 sort_mem = 32168 effective_cache_size = 64000 This is based on one gig of memory. Does anyone have any feedback on these values? Also, realizing that no two database are the same, etc., etc... does anyone have a good formula for setting these values? Thanks in advance, Joseph At 09:09 AM 9/16/2003 +0200, Manfred Koizar wrote: >On Mon, 15 Sep 2003 22:26:45 -0400, Christopher Browne ><cbbrowne@acm.org> wrote: > >> select count (*) from table; > >The only possible plan for THAT query will involve a seq scan of the > >whole table. If the postmaster already has the data in cache, it > >makes sense for it to run in 1 second. If it has to read it from > >disk, 12 seconds makes a lot of sense. > >Yes. And note that the main difference is between having the data in >memory and having to fetch it from disk. I don't believe that this >difference can be explained by 9000 read calls hitting the operating >system's cache. > > >You might want to increase the "shared_buffers" parameter in > >postgresql.conf; that should lead to increased stability of times as > >it should be more likely that the data in "table" will remain in > >cache. > >Let's not jump to this conclusion before we know what's going on. > >Joseph Bove <jbove@vetstar.com> wrote in another message above: >| I did have shared_buffers and sort_mem both set higher originally (15000, >| 32168) > >As I read this I think he meant "... and had the same performance >problem." > >Joseph, what do you get, if you run that > EXPLAIN ANALYSE SELECT count(*) ... >several times? What do vmstat and top show while the query is >running? Are there other processes active during or between the runs? >What kind of processes? Postgres backends? Web server? ... > >Servus > Manfred > >---------------------------(end of broadcast)--------------------------- >TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match
On Tue, 16 Sep 2003, Joseph Bove wrote: > I still think that 3 seconds is not acceptable. However, I reserve the > right to be wrong. Does it sound unrealistic to expect PostgreSQL to be > able to read 90,000 rows with 300 bytes per row in under a second? > first, check to see what your max throughput on your disk is using a benchmark such as Bonnie (Making sure to use a size LARGER than phsyical memory. 2x physical is veyr optimial). next, run your query again with a vmstat 1 running in another term. See how close the vmstat "bi" numbers correspond to your max according to bonnie. You could have an IO bottleneck. (I once went running around trying to figure it out and then discovered the issue was IO). -- Jeff Trout <jeff@jefftrout.com> http://www.jefftrout.com/ http://www.stuarthamm.net/