Re: Need for speed 2

From: Ron
Subject: Re: Need for speed 2
Date: ,
Msg-id: 6.2.3.4.0.20050825094630.05e6b160@pop.earthlink.net
(view: Whole thread, Raw)
In response to: Need for speed 2  (Ulrich Wisser)
Responses: Re: Need for speed 2  (Kelly Burkhart)
List: pgsql-performance

Tree view

Need for speed  (Ulrich Wisser, )
 Re: Need for speed  (Richard Huxton, )
 Re: Need for speed  (John A Meinel, )
 Re: Need for speed  ("Jeffrey W. Baker", )
 Re: Need for speed  (Alex Turner, )
  Re: Need for speed  ("Jim C. Nasby", )
 Re: Need for speed  (Dennis Bjorklund, )
  Re: Need for speed  (Ulrich Wisser, )
   Re: Need for speed  (Tom Lane, )
   Re: Need for speed  ("Jeffrey W. Baker", )
   Re: Need for speed  (Josh Berkus, )
   Re: Need for speed  (Ron, )
 Re: Need for speed  (Ron, )
  Re: Need for speed  (Matthew Nuzum, )
   Need for speed 2  (Ulrich Wisser, )
    Re: Need for speed 2  (Frank Wiles, )
    Re: Need for speed 2  (Ron, )
     Re: Need for speed 2  (Kelly Burkhart, )
      Re: Need for speed 2  (Alex Turner, )
    Re: Need for speed 2  ("Merlin Moncure", )
     What *_mem to increase when running CLUSTER  (Andrew Lazarus, )
      Re: What *_mem to increase when running CLUSTER  (Steve Poe, )
      Re: What *_mem to increase when running CLUSTER  (Tom Lane, )
 Re: Need for speed  ("Roger Hand", )
  Re: Need for speed  (Christopher Browne, )

At 03:10 AM 8/25/2005, Ulrich Wisser wrote:

>I realize I need to be much more specific. Here is a more detailed
>description of my hardware and system design.
>
>
>Pentium 4 2.4GHz
>Memory 4x DIMM DDR 1GB PC3200 400MHZ CAS3, KVR
>Motherboard chipset 'I865G', two IDE channels on board

First suggestion: Get better server HW.  AMD Opteron based dual
processor board is the current best in terms of price/performance
ratio, _particularly_ for DB applications like the one you have
described.  Such mainboards cost ~$400-$500.  RAM will cost about
$75-$150/GB.  Opteron 2xx are ~$200-$700 apiece.   So a 2P AMD system
can be had for as little as ~$850 + the cost of the RAM you need.  In
the worst case where you need 24GB of RAM (~$3600), the total comes
in at ~$4450.  As you can see from the numbers, buying only what RAM
you actually need can save you a great deal on money.

Given what little you said about how much of your DB is frequently
accessed, I'd suggest buying a server based around the 2P 16 DIMM
slot IWill DK88 mainboard (Tyan has announced a 16 DIMM slot
mainboard, but I do not think it is actually being sold yet.).  Then
fill it with the minimum amount of RAM that will allow the "working
set" of the DB to be cached in RAM.  In the worst case where DB
access is essentially uniform and essentially random, you will need
24GB of RAM to hold the 22GB DB + OS + etc.  That worst case is
_rare_.  Usually DB's have a working set that is smaller than the
entire DB.  You want to keep that working set in RAM.  If you can't
identify the working set, buy enough RAM to hold the entire DB.

In particular, you want to make sure that any frequently accessed
read only tables or indexes are kept in RAM.  The "read only" part is
very important.  Tables (and their indexes) that are frequently
written to _have_ to access HD.  Therefore you get much less out of
having them in RAM.  Read only tables and their indexes can be loaded
into tmpfs at boot time thereby keeping out of the way of the file
system buffer cache.  tmpfs does not save data if the host goes down
so it is very important that you ONLY use this trick with read only
tables.  The other half of the trick is to make sure that the file
system buffer cache does _not_ cache whatever you have loaded into tmpfs.


>2x SEAGATE BARRACUDA 7200.7 80GB 7200RPM ATA/100
>(software raid 1, system, swap, pg_xlog)
>ADAPTEC SCSI RAID 2100S ULTRA160 32MB 1-CHANNEL
>2x SEAGATE CHEETAH 15K.3 73GB ULTRA320 68-PIN WIDE
>(raid 1, /var/lib/pgsql)

Second suggestion: you need a MUCH better IO subsystem.  In fact,
given that you have described this system as being primarily OLTP
like, this is more important that the above server HW.  Best would be
to upgrade everything, but if you are strapped for cash, upgrade the
IO subsystem first.

You need many more spindles and a decent RAID card or cards.  You
want 15Krpm (best) or 10Krpm HDs.  As long as all of the HD's are at
least 10Krpm, more spindles is more important than faster
spindles.  If it's a choice between more 10Krpm discs or fewer 15Krpm
discs, buy the 10Krpm discs.  Get the spindle count as high as you
RAID cards can handle.

Whatever RAID cards you get should have as much battery backed write
buffer as possible.  In the commodity market, presently the highest
performance RAID cards I know of, and the ones that support the
largest battery backed write buffer, are made by Areca.


>Database size on disc is 22GB. (without pg_xlog)

Find out what the working set, ie the most frequently accessed
portion, of this 22GB is and you will know how much RAM is worth
having.  4GB is definitely too little!


>Please find my postgresql.conf below.

Third suggestion:  make sure you are running a 2.6 based kernel and
at least PG 8.0.3.  Helping beta test PG 8.1 might be an option for
you as well.


>Putting pg_xlog on the IDE drives gave about 10% performance
>improvement. Would faster disks give more performance?
>
>What my application does:
>
>Every five minutes a new logfile will be imported. Depending on the
>source of the request it will be imported in one of three "raw click"
>tables. (data from two months back, to be able to verify customer
>complains)  For reporting I have a set of tables. These contain data
>from the last two years. My app deletes all entries from today and
>reinserts updated data calculated from the raw data tables.

The raw data tables seem to be read only?  If so, you should buy
enough RAM to load them into tmpfs at boot time and have them be
completely RAM resident in addition to having enough RAM for the OS
to cache an appropriate amount of the rest of the DB.


>The queries contain no joins only aggregates. I have several indexes
>to speed different kinds of queries.
>
>My problems occur when one users does a report that contains too
>much old data. In that case all cache mechanisms will fail and disc
>io is the limiting factor.
>
>If one query contains so much data, that a full table scan is
>needed, I do not care if it takes two minutes to answer. But all
>other queries with less data (at the same time) still have to be fast.

HDs can only do one thing at once.  If they are in the middle of a
full table scan, everything else that requires HD access is going to
wait until it is done.

At some point, looking at your DB schema and queries will be worth it
for optimization purposes.  Right now, you HW is so underpowered
compared to the demands you are placing on it that there's little
point to SW tuning.

>I can not stop users doing that kind of reporting. :(
>
>I need more speed in orders of magnitude. Will more disks / more
>memory do that trick?

If you do the right things with them ;)

>Money is of course a limiting factor but it doesn't have to be real cheap.
>
>Ulrich
>
>
>
>
>
># -----------------------------
># PostgreSQL configuration file
># -----------------------------
>#---------------------------------------------------------------------------
># CONNECTIONS AND AUTHENTICATION
>#---------------------------------------------------------------------------
>
># - Connection Settings -
>
>tcpip_socket = true
>max_connections = 100
>         # note: increasing max_connections costs about 500 bytes of shared
>         # memory per connection slot, in addition to costs from
> shared_buffers
>         # and max_locks_per_transaction.
>#superuser_reserved_connections = 2
>#port = 5432
>#unix_socket_directory = ''
>#unix_socket_group = ''
>#unix_socket_permissions = 0777 # octal
>#virtual_host = ''              # what interface to listen on; defaults to any
>#rendezvous_name = ''           # defaults to the computer name
>
># - Security & Authentication -
>
>#authentication_timeout = 60    # 1-600, in seconds
>#ssl = false
>#password_encryption = true
>#krb_server_keyfile = ''
>#db_user_namespace = false
>
>
>#---------------------------------------------------------------------------
># RESOURCE USAGE (except WAL)
>#---------------------------------------------------------------------------
>
># - Memory -
>
>shared_buffers = 20000          # min 16, at least max_connections*2, 8KB each
>sort_mem = 4096         # min 64, size in KB

4MB seems small.  Find out how much memory you usually need for a
sort, and how many sorts you are usually doing at once to set this to
a sane size.


>vacuum_mem = 8192               # min 1024, size in KB
>
># - Free Space Map -
>
>max_fsm_pages = 200000          # min max_fsm_relations*16, 6 bytes each
>max_fsm_relations = 10000       # min 100, ~50 bytes each
>
># - Kernel Resource Usage -
>
>#max_files_per_process = 1000   # min 25
>#preload_libraries = ''
>
>
>#---------------------------------------------------------------------------
># WRITE AHEAD LOG
>#---------------------------------------------------------------------------
>
># - Settings -
>
>fsync = false                   # turns forced synchronization on or off
>#wal_sync_method = fsync        # the default varies across platforms:
>                                 # fsync, fdatasync, open_sync, or

I hope you have a battery backed write buffer!

>open_datasync
>wal_buffers = 128               # min 4, 8KB each

There might be a better value for you to use.

I'll hold off on looking at the rest of this...

># - Checkpoints -
>
>checkpoint_segments = 16        # in logfile segments, min 1, 16MB each
>#checkpoint_timeout = 300       # range 30-3600, in seconds
>#checkpoint_warning = 30        # 0 is off, in seconds
>#commit_delay = 0               # range 0-100000, in microseconds
>#commit_siblings = 5            # range 1-1000
>
>
>#---------------------------------------------------------------------------
># QUERY TUNING
>#---------------------------------------------------------------------------
>
># - Planner Method Enabling -
>
>#enable_hashagg = true
>#enable_hashjoin = true
>#enable_indexscan = true
>#enable_mergejoin = true
>#enable_nestloop = true
>#enable_seqscan = true
>#enable_sort = true
>#enable_tidscan = true
>
># - Planner Cost Constants -
>
>#effective_cache_size = 1000    # typically 8KB each
>#random_page_cost = 4           # units are one sequential page fetch cost
>#cpu_tuple_cost = 0.01          # (same)
>#cpu_index_tuple_cost = 0.001   # (same)
>#cpu_operator_cost = 0.0025     # (same)
>
># - Genetic Query Optimizer -
>
>#geqo = true
>#geqo_threshold = 11
>#geqo_effort = 1
>#geqo_generations = 0
>#geqo_pool_size = 0             # default based on tables in statement,
>                                 # range 128-1024
>#geqo_selection_bias = 2.0      # range 1.5-2.0
>
># - Other Planner Options -
>
>#default_statistics_target = 10 # range 1-1000
>#from_collapse_limit = 8
>#join_collapse_limit = 8        # 1 disables collapsing of explicit JOINs
>
>
>#---------------------------------------------------------------------------
># ERROR REPORTING AND LOGGING
>#---------------------------------------------------------------------------
>
># - Syslog -
>
>syslog = 2                      # range 0-2; 0=stdout; 1=both; 2=syslog
>syslog_facility = 'LOCAL0'
>syslog_ident = 'postgres'
>
># - When to Log -
>
>client_min_messages = info      # Values, in order of decreasing detail:
>                                 #   debug5, debug4, debug3, debug2, debug1,
>                                 #   log, info, notice, warning, error
>
>log_min_messages = info # Values, in order of decreasing detail:
>                                 #   debug5, debug4, debug3, debug2, debug1,
>                                 #   info, notice, warning, error, log,
>fatal,
>                                 #   panic
>
>log_error_verbosity = verbose   # terse, default, or verbose messages
>
>log_min_error_statement = info # Values in order of increasing severity:
>                                  #   debug5, debug4, debug3, debug2,
>debug1,
>                                  #   info, notice, warning, error,
>panic(off)
>
>log_min_duration_statement = 1000 # Log all statements whose
>                                  # execution time exceeds the value, in
>                                  # milliseconds.  Zero prints all queries.
>                                  # Minus-one disables.
>
>silent_mode = false              # DO NOT USE without Syslog!
>
># - What to Log -
>
>#debug_print_parse = false
>#debug_print_rewritten = false
>#debug_print_plan = false
>#debug_pretty_print = false
>log_connections = true
>#log_duration = false
>#log_pid = false
>#log_statement = false
>#log_timestamp = false
>#log_hostname = false
>#log_source_port = false
>
>
>#---------------------------------------------------------------------------
># RUNTIME STATISTICS
>#---------------------------------------------------------------------------
>
># - Statistics Monitoring -
>
>#log_parser_stats = false
>#log_planner_stats = false
>#log_executor_stats = false
>#log_statement_stats = false
>
># - Query/Index Statistics Collector -
>
>#stats_start_collector = true
>#stats_command_string = false
>#stats_block_level = false
>#stats_row_level = false
>#stats_reset_on_server_start = true
>
>
>#---------------------------------------------------------------------------
># CLIENT CONNECTION DEFAULTS
>#---------------------------------------------------------------------------
>
># - Statement Behavior -
>
>#search_path = '$user,public'   # schema names
>#check_function_bodies = true
>#default_transaction_isolation = 'read committed'
>#default_transaction_read_only = false
>#statement_timeout = 0          # 0 is disabled, in milliseconds
>
># - Locale and Formatting -
>
>#datestyle = 'iso, mdy'
>#timezone = unknown             # actually, defaults to TZ environment
>setting
>#australian_timezones = false
>#extra_float_digits = 0         # min -15, max 2
>#client_encoding = sql_ascii    # actually, defaults to database encoding
>
># These settings are initialized by initdb -- they may be changed
>lc_messages = 'en_US'           # locale for system error message strings
>lc_monetary = 'en_US'           # locale for monetary formatting
>lc_numeric = 'en_US'            # locale for number formatting
>lc_time = 'en_US'                       # locale for time formatting
>
># - Other Defaults -
>
>#explain_pretty_print = true
>#dynamic_library_path = '$libdir'
>#max_expr_depth = 10000         # min 10
>
>
>#---------------------------------------------------------------------------
># LOCK MANAGEMENT
>#---------------------------------------------------------------------------
>
>#deadlock_timeout = 1000        # in milliseconds
>#max_locks_per_transaction = 64 # min 10, ~260*max_connections bytes each
>
>
>#---------------------------------------------------------------------------
># VERSION/PLATFORM COMPATIBILITY
>#---------------------------------------------------------------------------
>
># - Previous Postgres Versions -
>
>#add_missing_from = true
>#regex_flavor = advanced        # advanced, extended, or basic
>#sql_inheritance = true
>
># - Other Platforms & Clients -
>
>#transform_null_equals = false
>
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 2: Don't 'kill -9' the postmaster





pgsql-performance by date:

From: Chris Browne
Date:
Subject: Re: Read/Write block sizes
From: Ron
Date:
Subject: Re: Read/Write block sizes