psql \copy from sends a lot of packets - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject psql \copy from sends a lot of packets
Date
Msg-id 40b2cec0-d0fb-3191-2ae1-9a3fe16a7e48@iki.fi
Whole thread Raw
Responses Re: psql \copy from sends a lot of packets
Re: psql \copy from sends a lot of packets
List pgsql-hackers
Hi,

I just noticed that if you load a file using psql:

\copy <table> from <local file>

it sends every line as a separate FE/BE protocol CopyData packet. That's 
pretty wasteful if the lines are narrow. The overhead of each CopyData 
packet is 5 bytes.

To demonstrate, I generated a simple test file with the string "foobar" 
repeated 10 million times:

$ perl -le 'for (1..10000000) { print "foobar" }' > /tmp/testdata

and loaded that into a temp table with psql:

create temporary table copytest (t text) on commit delete rows;
\copy copytest from '/tmp/testdata';

I repeated and timed the \copy a few times; it takes about about 3 
seconds on my laptop:

postgres=# \copy copytest from '/tmp/testdata';
COPY 10000000
Time: 3039.625 ms (00:03.040)

Wireshark says that that involved about 120 MB of network traffic. The 
size of the file on disk is only 70 MB.

The attached patch modifies psql so that it buffers up 8 kB of data into 
each CopyData message, instead of sending one per line. That makes the 
operation faster:

postgres=# \copy copytest from '/tmp/testdata';
COPY 10000000
Time: 2490.268 ms (00:02.490)

And wireshark confirms that there's now only a bit over 70 MB of network 
traffic.

I'll add this to the next commitfest. There's similar inefficiency in 
the server side in COPY TO, but I'll leave that for another patch.

- Heikki

Attachment

pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Preserve attstattarget on REINDEX CONCURRENTLY
Next
From: Tom Lane
Date:
Subject: Re: psql \copy from sends a lot of packets