Re: Issues with \copy from file - Mailing list pgsql-performance

From Sigurgeir Gunnarsson
Subject Re: Issues with \copy from file
Date
Msg-id ebd3ad520912180446m4a7eeaa3kf859e706f29205d6@mail.gmail.com
Whole thread Raw
In response to Re: Issues with \copy from file  (Matthew Wakeling <matthew@flymine.org>)
Responses Re: Issues with \copy from file
List pgsql-performance
I hope the issue is still open though I haven't replied to it before.

Euler mentioned that I did not provide any details about my system. I'm using version 8.3 and with most settings default on an old machine with 2 GB of mem. The table definition is simple, four columns; id, value, x, y where id is primary key and x, y are combined into an index.

I'm not sure if it matters but unlike Euler's suggestion I'm using \copy instead of COPY. Regarding my comparison to MySQL, it is completely valid. This is done on the same computer, using the same disk on the same platform. From that I would derive that IO is not my problem, unless postgresql is doing IO twice while MySQL only once.

I guess my tables are InnoDB since that is the default type (or so I think). BEGIN/COMMIT I did not find change much. Are there any other suggestions ?

My postgres.conf:
#------------------------------------------------------------------------------
# RESOURCE USAGE (except WAL)
#------------------------------------------------------------------------------

# - Memory -
shared_buffers = 16MB                   # min 128kB or max_connections*16kB
temp_buffers = 16MB                     # min 800kB
#max_prepared_transactions = 5          # can be 0 or more
# Note:  Increasing max_prepared_transactions costs ~600 bytes of shared memory
# per transaction slot, plus lock space (see max_locks_per_transaction).
work_mem = 128MB                        # min 64kB
maintenance_work_mem = 128MB            # min 1MB
#max_stack_depth = 2MB                  # min 100kB

# - Free Space Map -
max_fsm_pages = 2097152                 # min max_fsm_relations*16, 6 bytes each
max_fsm_relations = 500                 # min 100, ~70 bytes each

# - Kernel Resource Usage -
#max_files_per_process = 1000           # min 25
#shared_preload_libraries = ''          # (change requires restart)
# - Cost-Based Vacuum Delay -
#vacuum_cost_delay = 0                  # 0-1000 milliseconds
#vacuum_cost_page_hit = 1               # 0-10000 credits
#vacuum_cost_page_miss = 10             # 0-10000 credits
#vacuum_cost_page_dirty = 20            # 0-10000 credits
#vacuum_cost_limit = 200                # 1-10000 credits

#------------------------------------------------------------------------------
# WRITE AHEAD LOG
#------------------------------------------------------------------------------

# - Settings -
#fsync = on                             # turns forced synchronization on or off
#synchronous_commit = on                # immediate fsync at commit
#wal_sync_method = fsync                # the default is the first option
#full_page_writes = on                  # recover from partial page writes
#wal_buffers = 64kB                     # min 32kB
#wal_writer_delay = 200ms               # 1-10000 milliseconds
#commit_delay = 0                       # range 0-100000, in microseconds
#commit_siblings = 5                    # range 1-1000

# - Checkpoints -
checkpoint_segments = 64                # in logfile segments, min 1, 16MB each
#checkpoint_timeout = 5min              # range 30s-1h
#checkpoint_completion_target = 0.9     # checkpoint target duration, 0.0 - 1.0
#checkpoint_warning = 30s               # 0 is off

# - Archiving -
#archive_mode = off             # allows archiving to be done
#archive_command = ''           # command to use to archive a logfile segment
#archive_timeout = 0            # force a logfile segment switch after this

#------------------------------------------------------------------------------
# AUTOVACUUM PARAMETERS
#------------------------------------------------------------------------------

autovacuum = on                         # Enable autovacuum subprocess?  'on'


2009/10/19 Matthew Wakeling <matthew@flymine.org>
On Sun, 18 Oct 2009, Scott Marlowe wrote:
You can only write data then commit it so fast to one drive, and that
speed is usually somewhere in the megabyte per second range.  450+150
in 5 minutes is 120 Megs per second, that's pretty fast, but is likely
the max speed of a modern super fast 15k rpm drive.  If it's taking 20
minutes then it's 30 Megs per second which is still really good if
you're in the middle of a busy afternoon and the db has other things
to do.

You're out by a factor of 60. That's minutes, not seconds.

More relevant is the fact that Postgres will normally log changes in the WAL, effectively writing the data twice. As Euler said, the trick is to tell Postgres that noone else will need to see the data, so it can skip the WAL step:


BEGIN;
TRUNCATE TABLE foo;
COPY foo FROM ...;
COMMIT;

I see upward of 100MB/s over here when I do this.

Matthew

--
Patron: "I am looking for a globe of the earth."
Librarian: "We have a table-top model over here."
Patron: "No, that's not good enough. Don't you have a life-size?"
Librarian: (pause) "Yes, but it's in use right now."


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: seq scan instead of index scan
Next
From: Robert Haas
Date:
Subject: Re: Automatic optimization of IN clauses via INNER JOIN