Thread: performance problem

performance problem

From
sanjay tumula
Date:
i am having poor performance with postgres 8.0.0

the config of my machine is:

single xeon 3.4 Ghz, 1GB of memory, two IDE drives mounted using LVM and
JFS file system on Suse 9.2 professional.

What I am trying to do is simple...atleast in ORACLE or DB2 that I am
trying to stay away from.

I have bunch of records that i need to load...about 30mil every month
for last 18months. each month is in four parts of about 7 mil.

so i have a stage table..that i load and check to see if there are any
dups and load into prod tables...

i tried several ways...all using pl/Pgsql...details are below....

what is it that i am doing wrong?  with method one...it finishes in
around 8hrs for 7.5 mil records....

with method 2...its processing 5000 records in 5 mins...but it increases
with each pass... first pass < 1min, second pass 2.5mins..and so on...

any input is greatly appreciated...other than i need to upgrade the
machine...which i am in the process of.

1.  tried to reduce the 7mil, into 'manageable chunks' of about 500,000,
100,000 or even 5000.....
for commiting between...to keep transactions small.

a sample SQL that i use...
/************************************************
   copy existing records with updates to history
************************************************/
   INSERT INTO sms.sms_src_history
   (
       smsshist_id_n
      ,smsshist_phone_s
      ,status_cd_s
      ,smsshist_effective_date_s
      ,smsshist_effective_time_s
      ,resporg_cd_s
      ,dataset_cd_s
      ,smsshist_phone_n
      ,smsshist_npa_n
      ,smsshist_nxx_n
      ,smsshist_line_n
      ,smsshist_create_dt
      ,smsshist_create_user_s
      ,smsshist_effective_dt
   )
   SELECT
       nextval('sms.sms_src_history_seq')
      ,smsscurr_phone_s
      ,status_cd_s
      ,smsscurr_effective_date_s
      ,smsscurr_effective_time_s
      ,resporg_cd_s
      ,dataset_cd_s
      ,smsscurr_phone_n
      ,smsscurr_npa_n
      ,smsscurr_nxx_n
      ,smsscurr_line_n
      ,LOCALTIMESTAMP
      ,'PRCURR_LOAD'
      ,smsscurr_effective_dt
   FROM
      sms.sms_src_current
   WHERE
      smsscurr_phone_s
   IN
   (
      SELECT
     phone_s
      FROM
      (
     SELECT
         trim(phone_s) as phone_s
        ,trim(status_cd_s) as status_cd_s
        ,trim(effective_date_s) as effective_date_s
        ,trim(effective_time_s) as effective_time_s
        ,trim(resporg_cd_s) as resporg_cd_s
     FROM
        sms_stage.stg_sms_src_current
     WHERE
        sssc_id_n between begin_rec and end_rec
     EXCEPT
     SELECT
         smsscurr_phone_s,status_cd_s,smsscurr_effective_date_s
        ,smsscurr_effective_time_s,resporg_cd_s
     FROM
        sms.sms_src_current
      ) AS in_values
      EXCEPT
      (
     SELECT trim(phone_s)
     FROM sms_stage.stg_sms_src_current
     WHERE sssc_id_n between begin_rec and end_rec
     EXCEPT
     SELECT smsscurr_phone_s
     FROM sms.sms_src_current
      )
   )

2.  declare a cursor and go thru records one by one...this one i tried
only 5000 at a time....still VERY slow

   stg_curs CURSOR FOR
      SELECT
      sssc_id_n
     ,trim(phone_s) as phone_s
     ,trim(status_cd_s) as status_cd_s
     ,trim(effective_date_s) as effective_date_s
     ,trim(effective_time_s) as effective_time_s
     ,trim(resporg_cd_s) as resporg_cd_s
     ,trim(dataset_cd_s) as dataset_cd_s
      FROM
     sms_stage.stg_sms_src_current
      WHERE
     sssc_id_n between begin_rec and end_rec

   prod_curs CURSOR FOR
      SELECT *
      FROM sms.sms_src_current
      WHERE smsscurr_phone_n =  to_number(substr(stg_rec.phone_s,1,3)
               ||substr(stg_rec.phone_s,5,3)
               ||substr(stg_rec.phone_s,9,4),'9999999999')
      ORDER BY smsscurr_id_n DESC


etc etc...


3.  the portion of the shell script that call the above function is
   echo "Getting Record Count At: "`date`
   recnum=`psql -t -c "SELECT COALESCE(MAX(sssc_id_n),0) from
sms_stage.stg_sms_src_current " sms`
   echo "Done Getting Record Count At: "`date`

   if [ $recnum -eq 0 ]
   then
      echo "No Source Records To Process In Stage"   else
      echo "Processing Source Data Started At: "`date`
      for (( begin_rec = 1,end_rec = increment
         ;end_rec < recnum
         ;begin_rec = end_rec + 1,end_rec = end_rec + increment
         ))
      do
         echo "----------------------------------------------------"
         echo "Calling process_src_data At: "`date`
         echo "With"
         echo "begin_rec: "$begin_rec
         echo "end_rec: "$end_rec
         psql -e -v precs=$begin_rec,$end_rec -f pg_process_src_data.sh sms
         echo "Completed process_src_data At: "`date`

etc....


4.  the following is my conf file

start of conf file:
#-------------------------------------------------------------------------------
external_pid_file = 'postmaster'                # write an extra pid file


#---------------------------------------------------------------------------
# CONNECTIONS AND AUTHENTICATION
#---------------------------------------------------------------------------

# - Connection Settings -

#listen_addresses = 'localhost' # what IP interface(s) to listen on;
                                # defaults to localhost, '*' = any
listen_addresses = '*'  # what IP interface(s) to listen on;
#port = 5432
max_connections = 200
        # note: increasing max_connections costs about 500 bytes of shared
        # memory per connection slot, in addition to costs from
shared_buffers
        # and max_locks_per_transaction.
#superuser_reserved_connections = 2
superuser_reserved_connections = 15
#unix_socket_directory = ''
#unix_socket_group = ''
#unix_socket_permissions = 0777 # octal
unix_socket_permissions = 0700  # octal
#rendezvous_name = ''           # defaults to the computer name

# - Security & Authentication -

#authentication_timeout = 60    # 1-600, in seconds
#ssl = false
ssl = true
#password_encryption = true
#krb_server_keyfile = ''
#db_user_namespace = false


#---------------------------------------------------------------------------
# RESOURCE USAGE (except WAL)
#---------------------------------------------------------------------------

# - Memory -

shared_buffers = 38400          # min 16, at least max_connections*2,
8KB each
work_mem = 153600               # min 64, size in KB
maintenance_work_mem = 102400   # min 1024, size in KB
max_stack_depth = 10240         # min 100, size in KB

# - Free Space Map -

max_fsm_pages = 200000          # min max_fsm_relations*16, 6 bytes each
max_fsm_relations = 10000       # min 100, ~50 bytes each

# - Kernel Resource Usage -

#max_files_per_process = 1000   # min 25
preload_libraries = '$libdir/plpgsql:plpgsql_init'

# - Cost-Based Vacuum Delay -

#vacuum_cost_delay = 0          # 0-1000 milliseconds
#vacuum_cost_page_hit = 1       # 0-10000 credits
#vacuum_cost_page_miss = 10     # 0-10000 credits
#vacuum_cost_page_dirty = 20    # 0-10000 credits
#vacuum_cost_limit = 200        # 0-10000 credits

# - Background writer -

bgwriter_delay = 1000           # 10-10000 milliseconds between rounds
bgwriter_percent = 5            # 0-100% of dirty buffers in each round
#bgwriter_maxpages = 100        # 0-1000 buffers max per round


#---------------------------------------------------------------------------
# WRITE AHEAD LOG
#---------------------------------------------------------------------------

# - Settings -

fsync = false                   # turns forced synchronization on or off
#wal_sync_method = fsync        # the default varies across platforms:
                                # fsync, fdatasync, open_sync, or
open_datasync
wal_buffers = 40                # min 4, 8KB each
#commit_delay = 0               # range 0-100000, in microseconds
#commit_siblings = 5            # range 1-1000

# - Checkpoints -

checkpoint_segments = 10        # in logfile segments, min 1, 16MB each
checkpoint_timeout = 600        # range 30-3600, in seconds
#checkpoint_warning = 30        # 0 is off, in seconds

# - Archiving -

#archive_command = ''           # command to use to archive a logfile
segment
archive_command = 'test ! -f /data/archivelog/"%f" && cp "%p"
/data/archivelog/"%f"'


#---------------------------------------------------------------------------
# QUERY TUNING
#---------------------------------------------------------------------------

# - Planner Method Configuration -

#enable_hashagg = true
#enable_hashjoin = true
#enable_indexscan = true
#enable_mergejoin = true
#enable_nestloop = true
#enable_seqscan = true
#enable_sort = true
#enable_tidscan = true

# - Planner Cost Constants -

effective_cache_size = 3000     # typically 8KB each
#random_page_cost = 4           # units are one sequential page fetch cost
#cpu_tuple_cost = 0.01          # (same)
#cpu_index_tuple_cost = 0.001   # (same)
#cpu_operator_cost = 0.0025     # (same)

# - Genetic Query Optimizer -

#geqo = true
#geqo_threshold = 12
#geqo_effort = 5                # range 1-10
#geqo_pool_size = 0             # selects default based on effort
#geqo_generations = 0           # selects default based on effort
#geqo_selection_bias = 2.0      # range 1.5-2.0

# - Other Planner Options -

#default_statistics_target = 10 # range 1-1000
#from_collapse_limit = 8
#join_collapse_limit = 8        # 1 disables collapsing of explicit JOINs


#---------------------------------------------------------------------------
# ERROR REPORTING AND LOGGING
#---------------------------------------------------------------------------

# - Where to Log -

#log_destination = 'stderr'     # Valid values are combinations of stderr,
                                # syslog and eventlog, depending on
                                # platform.

# This is relevant when logging to stderr:
redirect_stderr = true    # Enable capturing of stderr into log files.
# These are only relevant if redirect_stderr is true:
log_directory = 'pg_log'   # Directory where log files are written.
                            # May be specified absolute or relative to
PGDATA
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' # Log file name pattern.
                            # May include strftime() escapes
log_truncate_on_rotation = false  # If true, any existing log file of the
                            # same name as the new log file will be
truncated
                            # rather than appended to.  But such truncation
                            # only occurs on time-driven rotation,
                            # not on restarts or size-driven rotation.
                            # Default is false, meaning append to existing
                            # files in all cases.
log_rotation_age = 14400    # Automatic rotation of logfiles will happen
after
                            # so many minutes.  0 to disable.
#log_rotation_size = 10240  # Automatic rotation of logfiles will happen
after
                            # so many kilobytes of log output.  0 to
disable.

# These are relevant when logging to syslog:
#syslog_facility = 'LOCAL0'
#syslog_ident = 'postgres'


# - When to Log -

#client_min_messages = notice   # Values, in order of decreasing detail:
                                #   debug5, debug4, debug3, debug2, debug1,
                                #   log, notice, warning, error

#log_min_messages = notice      # Values, in order of decreasing detail:
                                #   debug5, debug4, debug3, debug2, debug1,
                                #   info, notice, warning, error, log,
fatal,
                                #   panic

#log_error_verbosity = default  # terse, default, or verbose messages

#log_min_error_statement = panic # Values in order of increasing severity:
                                 #   debug5, debug4, debug3, debug2, debug1,
                                 #   info, notice, warning, error,
panic(off)

log_min_duration_statement = 2147483 # -1 is disabled, in milliseconds.

silent_mode = true               # DO NOT USE without syslog or
redirect_stderr

# - What to Log -

#debug_print_parse = false
#debug_print_rewritten = false
#debug_print_plan = false
#debug_pretty_print = false
#log_connections = false
#log_disconnections = false
#log_duration = false
#log_line_prefix = '%t'         # e.g. '<%u%%%d> '
                                # %u=user name %d=database name
                                # %r=remote host and port
                                # %p=PID %t=timestamp %i=command tag
                                # %c=session id %l=session line number
                                # %s=session start timestamp
%x=transaction id
                                # %q=stop here in non-session processes
                                # %%='%'
#log_statement = 'none'         # none, mod, ddl, all
#log_hostname = false


#---------------------------------------------------------------------------
# RUNTIME STATISTICS
#---------------------------------------------------------------------------

# - Statistics Monitoring -

#log_parser_stats = false
#log_planner_stats = false
#log_executor_stats = false
log_statement_stats = true

# - Query/Index Statistics Collector -

#stats_start_collector = true
stats_command_string = true
#stats_block_level = false
#stats_row_level = false
#stats_reset_on_server_start = true


#---------------------------------------------------------------------------
# CLIENT CONNECTION DEFAULTS
#---------------------------------------------------------------------------

# - Statement Behavior -

#search_path = '$user,public'   # schema names
#default_tablespace = ''        # a tablespace name, or '' for default
#check_function_bodies = true
#default_transaction_isolation = 'read committed'
#default_transaction_read_only = false
#statement_timeout = 0          # 0 is disabled, in milliseconds

# - Locale and Formatting -

#datestyle = 'iso, mdy'
#timezone = unknown             # actually, defaults to TZ environment
setting
#australian_timezones = false
#extra_float_digits = 0         # min -15, max 2
#client_encoding = sql_ascii    # actually, defaults to database encoding

# These settings are initialized by initdb -- they might be changed
lc_messages = 'en_US.UTF-8'             # locale for system error
message strings
lc_monetary = 'en_US.UTF-8'             # locale for monetary formatting
lc_numeric = 'en_US.UTF-8'              # locale for number formatting
lc_time = 'en_US.UTF-8'                 # locale for time formatting

# - Other Defaults -

#explain_pretty_print = true
#dynamic_library_path = '$libdir'


#---------------------------------------------------------------------------
# LOCK MANAGEMENT
#---------------------------------------------------------------------------

deadlock_timeout = 30000        # in milliseconds
#max_locks_per_transaction = 64 # min 10, ~200*max_connections bytes each


#---------------------------------------------------------------------------
# VERSION/PLATFORM COMPATIBILITY
#---------------------------------------------------------------------------

# - Previous Postgres Versions -

add_missing_from = false
#regex_flavor = advanced        # advanced, extended, or basic
#sql_inheritance = true
default_with_oids = false

# - Other Platforms & Clients -

#transform_null_equals = false

# END OF CONF FILE