Hello,
We're using a statement like this to dump between 500K and >5 million rows.
COPY(SELECT SomeID FROM SomeTable WHERE SomeColumn > '0')
TO '/dev/shm/SomeFile.csv'
Upon first run, this operation can take several minutes. Upon second run,
it will be complete in generally well under a minute.
Some observations/thoughts:
-- When watching the file grow, it will "trickle" upon first run, increasing
in size by only a few kb/s. Upon second run, it will grow by megabytes per
second.
-- If we dump to disk, the same behavior is observed - no apparent RAM
contention issues.
-- The SELECT by itself will complete typically in a few seconds, either on
second or first run.
-- Upon the first run, the PostgreSQL process typically consumes <10% CPU.
-- In very rare cases, we've seen even the first run be as fast as the
second run.
-- There is no other activity on the server while performing these tests.
-- SomeID is just an int
-- CPU usage (wait %) is low; seems as though there is some other bottleneck
Any thoughts on:
-- tuning parameters to speed up the initial execution
-- why is it so slow to begin with?
-- possible areas to check for bottlenecks?
-- better method for accomplishing the same thing?
Thank you,
H