COPY insert performance - Mailing list pgsql-performance
From | Chris Isaacson |
---|---|
Subject | COPY insert performance |
Date | |
Msg-id | 07774C6E31D94A44A2A60E2085944F09071DF0@tbmail.tradebot.com Whole thread Raw |
Responses |
Re: COPY insert performance
Re: COPY insert performance Re: COPY insert performance |
List | pgsql-performance |
I need COPY via libpqxx to insert millions of rows into two tables. One table has roughly have as many rows and requires half the storage. In production, the largest table will grow by ~30M rows/day. To test the COPY performance I split my transactions into 10,000 rows. I insert roughly 5000 rows into table A for every 10,000 rows into table B.
Table A has one unique index:
"order_main_pk" UNIQUE, btree (cl_ord_id)
Table B has 1 unique index and 2 non-unique indexes:
"order_transition_pk" UNIQUE, btree (collating_seq)
"order_transition_ak2" btree (orig_cl_ord_id)
"order_transition_ak3" btree (exec_id)
"order_transition_ak2" btree (orig_cl_ord_id)
"order_transition_ak3" btree (exec_id)
My testing environment is as follows:
-Postgresql 8.0.1
-libpqxx 2.5.0
-Linux 2.6.11.4-21.7-smp x86_64
-Dual Opteron 246
-System disk (postgres data resides on this SCSI disk) - Seagate (ST373453LC) - 15K, 73 GB (http://www.seagate.com/cda/products/discsales/marketing/detail/0,1081,549,00.html)
-2nd logical disk - 10K, 36GB IBM SCSI (IC35L036UCDY10-0) - WAL reside on this disk
-NO RAID
PostgreSQL
Here are the results of copying in 10M rows as fast as possible: (10K/transaction)
Total Time: 1129.556 s
Rows/sec: 9899.922
Transaction>1.2s 225
Transaction>1.5s 77
Transaction>2.0s 4
Max Transaction 2.325s
MySQL
I ran a similar test with MySQL 4.1.10a (InnoDB) which produced these results: (I used MySQL's INSERT INTO x VALUES (1,2,3)(4,5,6)(...,...,...) syntax) (10K/transaction)
Total Time: 860.000 s
Rows/sec: 11627.91
Transaction>1.2s 0
Transaction>1.5s 0
Transaction>2.0s 0
Max Transaction 1.175s
Considering the configurations shown below, can anyone offer advice to close the 15% gap and the much worse variability I'm experiencing. Thanks
My postgresql.conf has the following non-default values:
# -----------------------------
# PostgreSQL configuration file
# -----------------------------
listen_addresses = '*' # what IP interface(s) to listen on;
max_connections = 100
# PostgreSQL configuration file
# -----------------------------
listen_addresses = '*' # what IP interface(s) to listen on;
max_connections = 100
#---------------------------------------------------------------------------
# RESOURCE USAGE (except WAL)
#---------------------------------------------------------------------------
shared_buffers = 65536 # min 16, at least max_connections*2, 8KB each
work_mem = 2048 # min 64, size in KB
maintenance_work_mem = 204800 # min 1024, size in KB
max_fsm_pages = 2250000 # min max_fsm_relations*16, 6 bytes each
bgwriter_delay = 200 # 10-10000 milliseconds between rounds
bgwriter_percent = 10 # 0-100% of dirty buffers in each round
bgwriter_maxpages = 1000 # 0-1000 buffers max per round
# RESOURCE USAGE (except WAL)
#---------------------------------------------------------------------------
shared_buffers = 65536 # min 16, at least max_connections*2, 8KB each
work_mem = 2048 # min 64, size in KB
maintenance_work_mem = 204800 # min 1024, size in KB
max_fsm_pages = 2250000 # min max_fsm_relations*16, 6 bytes each
bgwriter_delay = 200 # 10-10000 milliseconds between rounds
bgwriter_percent = 10 # 0-100% of dirty buffers in each round
bgwriter_maxpages = 1000 # 0-1000 buffers max per round
#---------------------------------------------------------------------------
# WRITE AHEAD LOG
#---------------------------------------------------------------------------
fsync = false # turns forced synchronization on or off
wal_buffers = 64 # min 4, 8KB each
checkpoint_segments = 40 # in logfile segments, min 1, 16MB each
checkpoint_timeout = 600 # range 30-3600, in seconds
# WRITE AHEAD LOG
#---------------------------------------------------------------------------
fsync = false # turns forced synchronization on or off
wal_buffers = 64 # min 4, 8KB each
checkpoint_segments = 40 # in logfile segments, min 1, 16MB each
checkpoint_timeout = 600 # range 30-3600, in seconds
#---------------------------------------------------------------------------
# QUERY TUNING
#---------------------------------------------------------------------------
effective_cache_size = 65536 # typically 8KB each
random_page_cost = 2 # units are one sequential page fetch cost
# QUERY TUNING
#---------------------------------------------------------------------------
effective_cache_size = 65536 # typically 8KB each
random_page_cost = 2 # units are one sequential page fetch cost
#---------------------------------------------------------------------------
# ERROR REPORTING AND LOGGING
#---------------------------------------------------------------------------
log_min_duration_statement = 250 # -1 is disabled, in milliseconds.
# ERROR REPORTING AND LOGGING
#---------------------------------------------------------------------------
log_min_duration_statement = 250 # -1 is disabled, in milliseconds.
log_connections = true
log_disconnections = true
log_duration = true
log_line_prefix = '<%r%u%p%t%d%%' # e.g. '<%u%%%d> '
# %u=user name %d=database name
# %r=remote host and port
# %p=PID %t=timestamp %i=command tag
# %c=session id %l=session line number
# %s=session start timestamp %x=transaction id
# %q=stop here in non-session processes
# %%='%'
log_statement = 'none' # none, mod, ddl, all
log_disconnections = true
log_duration = true
log_line_prefix = '<%r%u%p%t%d%%' # e.g. '<%u%%%d> '
# %u=user name %d=database name
# %r=remote host and port
# %p=PID %t=timestamp %i=command tag
# %c=session id %l=session line number
# %s=session start timestamp %x=transaction id
# %q=stop here in non-session processes
# %%='%'
log_statement = 'none' # none, mod, ddl, all
#---------------------------------------------------------------------------
# RUNTIME STATISTICS
#---------------------------------------------------------------------------
# - Query/Index Statistics Collector -
stats_start_collector = true
stats_command_string = true
stats_block_level = true
stats_row_level = true
stats_reset_on_server_start = true
# RUNTIME STATISTICS
#---------------------------------------------------------------------------
# - Query/Index Statistics Collector -
stats_start_collector = true
stats_command_string = true
stats_block_level = true
stats_row_level = true
stats_reset_on_server_start = true
My MySQL my.ini has the following non default values:
innodb_data_home_dir = /var/lib/mysql/
innodb_data_file_path = ibdata1:10M:autoextend
innodb_log_group_home_dir = /var/lib/mysql/
innodb_log_arch_dir = /var/lib/mysql/
# You can set .._buffer_pool_size up to 50 - 80 %
# of RAM but beware of setting memory usage too high
innodb_buffer_pool_size = 512M
innodb_additional_mem_pool_size = 64M
# Set .._log_file_size to 25 % of buffer pool size
innodb_log_file_size = 128M
innodb_log_buffer_size = 64M
innodb_flush_log_at_trx_commit = 1
innodb_lock_wait_timeout = 50
innodb_flush_method = O_DSYNC
innodb_data_file_path = ibdata1:10M:autoextend
innodb_log_group_home_dir = /var/lib/mysql/
innodb_log_arch_dir = /var/lib/mysql/
# You can set .._buffer_pool_size up to 50 - 80 %
# of RAM but beware of setting memory usage too high
innodb_buffer_pool_size = 512M
innodb_additional_mem_pool_size = 64M
# Set .._log_file_size to 25 % of buffer pool size
innodb_log_file_size = 128M
innodb_log_buffer_size = 64M
innodb_flush_log_at_trx_commit = 1
innodb_lock_wait_timeout = 50
innodb_flush_method = O_DSYNC
max_allowed_packet = 16M
pgsql-performance by date: