performance problem - Mailing list pgsql-admin
From | sanjay tumula |
---|---|
Subject | performance problem |
Date | |
Msg-id | 425BFFB0.8060205@comcast.net Whole thread Raw |
List | pgsql-admin |
i am having poor performance with postgres 8.0.0 the config of my machine is: single xeon 3.4 Ghz, 1GB of memory, two IDE drives mounted using LVM and JFS file system on Suse 9.2 professional. What I am trying to do is simple...atleast in ORACLE or DB2 that I am trying to stay away from. I have bunch of records that i need to load...about 30mil every month for last 18months. each month is in four parts of about 7 mil. so i have a stage table..that i load and check to see if there are any dups and load into prod tables... i tried several ways...all using pl/Pgsql...details are below.... what is it that i am doing wrong? with method one...it finishes in around 8hrs for 7.5 mil records.... with method 2...its processing 5000 records in 5 mins...but it increases with each pass... first pass < 1min, second pass 2.5mins..and so on... any input is greatly appreciated...other than i need to upgrade the machine...which i am in the process of. 1. tried to reduce the 7mil, into 'manageable chunks' of about 500,000, 100,000 or even 5000..... for commiting between...to keep transactions small. a sample SQL that i use... /************************************************ copy existing records with updates to history ************************************************/ INSERT INTO sms.sms_src_history ( smsshist_id_n ,smsshist_phone_s ,status_cd_s ,smsshist_effective_date_s ,smsshist_effective_time_s ,resporg_cd_s ,dataset_cd_s ,smsshist_phone_n ,smsshist_npa_n ,smsshist_nxx_n ,smsshist_line_n ,smsshist_create_dt ,smsshist_create_user_s ,smsshist_effective_dt ) SELECT nextval('sms.sms_src_history_seq') ,smsscurr_phone_s ,status_cd_s ,smsscurr_effective_date_s ,smsscurr_effective_time_s ,resporg_cd_s ,dataset_cd_s ,smsscurr_phone_n ,smsscurr_npa_n ,smsscurr_nxx_n ,smsscurr_line_n ,LOCALTIMESTAMP ,'PRCURR_LOAD' ,smsscurr_effective_dt FROM sms.sms_src_current WHERE smsscurr_phone_s IN ( SELECT phone_s FROM ( SELECT trim(phone_s) as phone_s ,trim(status_cd_s) as status_cd_s ,trim(effective_date_s) as effective_date_s ,trim(effective_time_s) as effective_time_s ,trim(resporg_cd_s) as resporg_cd_s FROM sms_stage.stg_sms_src_current WHERE sssc_id_n between begin_rec and end_rec EXCEPT SELECT smsscurr_phone_s,status_cd_s,smsscurr_effective_date_s ,smsscurr_effective_time_s,resporg_cd_s FROM sms.sms_src_current ) AS in_values EXCEPT ( SELECT trim(phone_s) FROM sms_stage.stg_sms_src_current WHERE sssc_id_n between begin_rec and end_rec EXCEPT SELECT smsscurr_phone_s FROM sms.sms_src_current ) ) 2. declare a cursor and go thru records one by one...this one i tried only 5000 at a time....still VERY slow stg_curs CURSOR FOR SELECT sssc_id_n ,trim(phone_s) as phone_s ,trim(status_cd_s) as status_cd_s ,trim(effective_date_s) as effective_date_s ,trim(effective_time_s) as effective_time_s ,trim(resporg_cd_s) as resporg_cd_s ,trim(dataset_cd_s) as dataset_cd_s FROM sms_stage.stg_sms_src_current WHERE sssc_id_n between begin_rec and end_rec prod_curs CURSOR FOR SELECT * FROM sms.sms_src_current WHERE smsscurr_phone_n = to_number(substr(stg_rec.phone_s,1,3) ||substr(stg_rec.phone_s,5,3) ||substr(stg_rec.phone_s,9,4),'9999999999') ORDER BY smsscurr_id_n DESC etc etc... 3. the portion of the shell script that call the above function is echo "Getting Record Count At: "`date` recnum=`psql -t -c "SELECT COALESCE(MAX(sssc_id_n),0) from sms_stage.stg_sms_src_current " sms` echo "Done Getting Record Count At: "`date` if [ $recnum -eq 0 ] then echo "No Source Records To Process In Stage" else echo "Processing Source Data Started At: "`date` for (( begin_rec = 1,end_rec = increment ;end_rec < recnum ;begin_rec = end_rec + 1,end_rec = end_rec + increment )) do echo "----------------------------------------------------" echo "Calling process_src_data At: "`date` echo "With" echo "begin_rec: "$begin_rec echo "end_rec: "$end_rec psql -e -v precs=$begin_rec,$end_rec -f pg_process_src_data.sh sms echo "Completed process_src_data At: "`date` etc.... 4. the following is my conf file start of conf file: #------------------------------------------------------------------------------- external_pid_file = 'postmaster' # write an extra pid file #--------------------------------------------------------------------------- # CONNECTIONS AND AUTHENTICATION #--------------------------------------------------------------------------- # - Connection Settings - #listen_addresses = 'localhost' # what IP interface(s) to listen on; # defaults to localhost, '*' = any listen_addresses = '*' # what IP interface(s) to listen on; #port = 5432 max_connections = 200 # note: increasing max_connections costs about 500 bytes of shared # memory per connection slot, in addition to costs from shared_buffers # and max_locks_per_transaction. #superuser_reserved_connections = 2 superuser_reserved_connections = 15 #unix_socket_directory = '' #unix_socket_group = '' #unix_socket_permissions = 0777 # octal unix_socket_permissions = 0700 # octal #rendezvous_name = '' # defaults to the computer name # - Security & Authentication - #authentication_timeout = 60 # 1-600, in seconds #ssl = false ssl = true #password_encryption = true #krb_server_keyfile = '' #db_user_namespace = false #--------------------------------------------------------------------------- # RESOURCE USAGE (except WAL) #--------------------------------------------------------------------------- # - Memory - shared_buffers = 38400 # min 16, at least max_connections*2, 8KB each work_mem = 153600 # min 64, size in KB maintenance_work_mem = 102400 # min 1024, size in KB max_stack_depth = 10240 # min 100, size in KB # - Free Space Map - max_fsm_pages = 200000 # min max_fsm_relations*16, 6 bytes each max_fsm_relations = 10000 # min 100, ~50 bytes each # - Kernel Resource Usage - #max_files_per_process = 1000 # min 25 preload_libraries = '$libdir/plpgsql:plpgsql_init' # - Cost-Based Vacuum Delay - #vacuum_cost_delay = 0 # 0-1000 milliseconds #vacuum_cost_page_hit = 1 # 0-10000 credits #vacuum_cost_page_miss = 10 # 0-10000 credits #vacuum_cost_page_dirty = 20 # 0-10000 credits #vacuum_cost_limit = 200 # 0-10000 credits # - Background writer - bgwriter_delay = 1000 # 10-10000 milliseconds between rounds bgwriter_percent = 5 # 0-100% of dirty buffers in each round #bgwriter_maxpages = 100 # 0-1000 buffers max per round #--------------------------------------------------------------------------- # WRITE AHEAD LOG #--------------------------------------------------------------------------- # - Settings - fsync = false # turns forced synchronization on or off #wal_sync_method = fsync # the default varies across platforms: # fsync, fdatasync, open_sync, or open_datasync wal_buffers = 40 # min 4, 8KB each #commit_delay = 0 # range 0-100000, in microseconds #commit_siblings = 5 # range 1-1000 # - Checkpoints - checkpoint_segments = 10 # in logfile segments, min 1, 16MB each checkpoint_timeout = 600 # range 30-3600, in seconds #checkpoint_warning = 30 # 0 is off, in seconds # - Archiving - #archive_command = '' # command to use to archive a logfile segment archive_command = 'test ! -f /data/archivelog/"%f" && cp "%p" /data/archivelog/"%f"' #--------------------------------------------------------------------------- # QUERY TUNING #--------------------------------------------------------------------------- # - Planner Method Configuration - #enable_hashagg = true #enable_hashjoin = true #enable_indexscan = true #enable_mergejoin = true #enable_nestloop = true #enable_seqscan = true #enable_sort = true #enable_tidscan = true # - Planner Cost Constants - effective_cache_size = 3000 # typically 8KB each #random_page_cost = 4 # units are one sequential page fetch cost #cpu_tuple_cost = 0.01 # (same) #cpu_index_tuple_cost = 0.001 # (same) #cpu_operator_cost = 0.0025 # (same) # - Genetic Query Optimizer - #geqo = true #geqo_threshold = 12 #geqo_effort = 5 # range 1-10 #geqo_pool_size = 0 # selects default based on effort #geqo_generations = 0 # selects default based on effort #geqo_selection_bias = 2.0 # range 1.5-2.0 # - Other Planner Options - #default_statistics_target = 10 # range 1-1000 #from_collapse_limit = 8 #join_collapse_limit = 8 # 1 disables collapsing of explicit JOINs #--------------------------------------------------------------------------- # ERROR REPORTING AND LOGGING #--------------------------------------------------------------------------- # - Where to Log - #log_destination = 'stderr' # Valid values are combinations of stderr, # syslog and eventlog, depending on # platform. # This is relevant when logging to stderr: redirect_stderr = true # Enable capturing of stderr into log files. # These are only relevant if redirect_stderr is true: log_directory = 'pg_log' # Directory where log files are written. # May be specified absolute or relative to PGDATA log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' # Log file name pattern. # May include strftime() escapes log_truncate_on_rotation = false # If true, any existing log file of the # same name as the new log file will be truncated # rather than appended to. But such truncation # only occurs on time-driven rotation, # not on restarts or size-driven rotation. # Default is false, meaning append to existing # files in all cases. log_rotation_age = 14400 # Automatic rotation of logfiles will happen after # so many minutes. 0 to disable. #log_rotation_size = 10240 # Automatic rotation of logfiles will happen after # so many kilobytes of log output. 0 to disable. # These are relevant when logging to syslog: #syslog_facility = 'LOCAL0' #syslog_ident = 'postgres' # - When to Log - #client_min_messages = notice # Values, in order of decreasing detail: # debug5, debug4, debug3, debug2, debug1, # log, notice, warning, error #log_min_messages = notice # Values, in order of decreasing detail: # debug5, debug4, debug3, debug2, debug1, # info, notice, warning, error, log, fatal, # panic #log_error_verbosity = default # terse, default, or verbose messages #log_min_error_statement = panic # Values in order of increasing severity: # debug5, debug4, debug3, debug2, debug1, # info, notice, warning, error, panic(off) log_min_duration_statement = 2147483 # -1 is disabled, in milliseconds. silent_mode = true # DO NOT USE without syslog or redirect_stderr # - What to Log - #debug_print_parse = false #debug_print_rewritten = false #debug_print_plan = false #debug_pretty_print = false #log_connections = false #log_disconnections = false #log_duration = false #log_line_prefix = '%t' # e.g. '<%u%%%d> ' # %u=user name %d=database name # %r=remote host and port # %p=PID %t=timestamp %i=command tag # %c=session id %l=session line number # %s=session start timestamp %x=transaction id # %q=stop here in non-session processes # %%='%' #log_statement = 'none' # none, mod, ddl, all #log_hostname = false #--------------------------------------------------------------------------- # RUNTIME STATISTICS #--------------------------------------------------------------------------- # - Statistics Monitoring - #log_parser_stats = false #log_planner_stats = false #log_executor_stats = false log_statement_stats = true # - Query/Index Statistics Collector - #stats_start_collector = true stats_command_string = true #stats_block_level = false #stats_row_level = false #stats_reset_on_server_start = true #--------------------------------------------------------------------------- # CLIENT CONNECTION DEFAULTS #--------------------------------------------------------------------------- # - Statement Behavior - #search_path = '$user,public' # schema names #default_tablespace = '' # a tablespace name, or '' for default #check_function_bodies = true #default_transaction_isolation = 'read committed' #default_transaction_read_only = false #statement_timeout = 0 # 0 is disabled, in milliseconds # - Locale and Formatting - #datestyle = 'iso, mdy' #timezone = unknown # actually, defaults to TZ environment setting #australian_timezones = false #extra_float_digits = 0 # min -15, max 2 #client_encoding = sql_ascii # actually, defaults to database encoding # These settings are initialized by initdb -- they might be changed lc_messages = 'en_US.UTF-8' # locale for system error message strings lc_monetary = 'en_US.UTF-8' # locale for monetary formatting lc_numeric = 'en_US.UTF-8' # locale for number formatting lc_time = 'en_US.UTF-8' # locale for time formatting # - Other Defaults - #explain_pretty_print = true #dynamic_library_path = '$libdir' #--------------------------------------------------------------------------- # LOCK MANAGEMENT #--------------------------------------------------------------------------- deadlock_timeout = 30000 # in milliseconds #max_locks_per_transaction = 64 # min 10, ~200*max_connections bytes each #--------------------------------------------------------------------------- # VERSION/PLATFORM COMPATIBILITY #--------------------------------------------------------------------------- # - Previous Postgres Versions - add_missing_from = false #regex_flavor = advanced # advanced, extended, or basic #sql_inheritance = true default_with_oids = false # - Other Platforms & Clients - #transform_null_equals = false # END OF CONF FILE
pgsql-admin by date: