Hi,
I just noticed that if you load a file using psql:
\copy <table> from <local file>
it sends every line as a separate FE/BE protocol CopyData packet. That's
pretty wasteful if the lines are narrow. The overhead of each CopyData
packet is 5 bytes.
To demonstrate, I generated a simple test file with the string "foobar"
repeated 10 million times:
$ perl -le 'for (1..10000000) { print "foobar" }' > /tmp/testdata
and loaded that into a temp table with psql:
create temporary table copytest (t text) on commit delete rows;
\copy copytest from '/tmp/testdata';
I repeated and timed the \copy a few times; it takes about about 3
seconds on my laptop:
postgres=# \copy copytest from '/tmp/testdata';
COPY 10000000
Time: 3039.625 ms (00:03.040)
Wireshark says that that involved about 120 MB of network traffic. The
size of the file on disk is only 70 MB.
The attached patch modifies psql so that it buffers up 8 kB of data into
each CopyData message, instead of sending one per line. That makes the
operation faster:
postgres=# \copy copytest from '/tmp/testdata';
COPY 10000000
Time: 2490.268 ms (00:02.490)
And wireshark confirms that there's now only a bit over 70 MB of network
traffic.
I'll add this to the next commitfest. There's similar inefficiency in
the server side in COPY TO, but I'll leave that for another patch.
- Heikki