Thread: psql \copy from sends a lot of packets

psql \copy from sends a lot of packets

From
Heikki Linnakangas
Date:
Hi,

I just noticed that if you load a file using psql:

\copy <table> from <local file>

it sends every line as a separate FE/BE protocol CopyData packet. That's 
pretty wasteful if the lines are narrow. The overhead of each CopyData 
packet is 5 bytes.

To demonstrate, I generated a simple test file with the string "foobar" 
repeated 10 million times:

$ perl -le 'for (1..10000000) { print "foobar" }' > /tmp/testdata

and loaded that into a temp table with psql:

create temporary table copytest (t text) on commit delete rows;
\copy copytest from '/tmp/testdata';

I repeated and timed the \copy a few times; it takes about about 3 
seconds on my laptop:

postgres=# \copy copytest from '/tmp/testdata';
COPY 10000000
Time: 3039.625 ms (00:03.040)

Wireshark says that that involved about 120 MB of network traffic. The 
size of the file on disk is only 70 MB.

The attached patch modifies psql so that it buffers up 8 kB of data into 
each CopyData message, instead of sending one per line. That makes the 
operation faster:

postgres=# \copy copytest from '/tmp/testdata';
COPY 10000000
Time: 2490.268 ms (00:02.490)

And wireshark confirms that there's now only a bit over 70 MB of network 
traffic.

I'll add this to the next commitfest. There's similar inefficiency in 
the server side in COPY TO, but I'll leave that for another patch.

- Heikki

Attachment

Re: psql \copy from sends a lot of packets

From
Tom Lane
Date:
Heikki Linnakangas <hlinnaka@iki.fi> writes:
> I just noticed that if you load a file using psql:
> it sends every line as a separate FE/BE protocol CopyData packet.
> ...
> I'll add this to the next commitfest. There's similar inefficiency in 
> the server side in COPY TO, but I'll leave that for another patch.

The FE/BE protocol documentation is pretty explicit about this:

    Copy-in mode (data transfer to the server) is initiated when the
    backend executes a COPY FROM STDIN SQL statement. The backend sends a
    CopyInResponse message to the frontend. The frontend should then send
    zero or more CopyData messages, forming a stream of input data. (The
    message boundaries are not required to have anything to do with row
    boundaries, although that is often a reasonable choice.)
    ...
    Copy-out mode (data transfer from the server) is initiated when the
    backend executes a COPY TO STDOUT SQL statement. The backend sends a
    CopyOutResponse message to the frontend, followed by zero or more
    CopyData messages (always one per row), followed by CopyDone.

So while changing psql isn't so much a problem, changing the server
is a wire protocol break.  Maybe we should do it anyway, but I'm
not sure.

            regards, tom lane



Re: psql \copy from sends a lot of packets

From
Aleksander Alekseev
Date:
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:       tested, passed
Spec compliant:           tested, passed
Documentation:            tested, passed

The patch was marked as the one that needs review and doesn't currently have
a reviewer, so I decided to take a look. The patch was tested on MacOS against
master `e0271d5f`. It works fine and doesn't seem to contradict the current
documentation.

The future COPY TO patch may require some changes in the docs, as Tom pointed
out. I also wonder if it may affect any 3rd party applications and if we care
about this, but I suggest we discuss this when and if a corresponding patch
will be proposed.

The new status of this patch is: Ready for Committer

Re: psql \copy from sends a lot of packets

From
Heikki Linnakangas
Date:
On 13/07/2021 14:52, Aleksander Alekseev wrote:
> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:       tested, passed
> Spec compliant:           tested, passed
> Documentation:            tested, passed
> 
> The patch was marked as the one that needs review and doesn't currently have
> a reviewer, so I decided to take a look. The patch was tested on MacOS against
> master `e0271d5f`. It works fine and doesn't seem to contradict the current
> documentation.

Thanks for the review! I read through it myself one more time and 
spotted one bug: in interactive mode, the prompt was printed twice in 
the beginning of the operation. Fixed that, and pushed.

- Heikki