Re: Why is pq_begintypsend so slow? - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Why is pq_begintypsend so slow?
Date
Msg-id CA+TgmoaQ4ga3cQLsKEmReHFHxBeH8_+rfgGXhA5speccKzJ-0g@mail.gmail.com
Whole thread Raw
In response to Re: Why is pq_begintypsend so slow?  (Andres Freund <andres@anarazel.de>)
Responses Re: Why is pq_begintypsend so slow?  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Tue, Jun 2, 2020 at 9:56 PM Andres Freund <andres@anarazel.de> wrote:
> The biggest problem after that is that we waste a lot of time memcpying
> stuff around repeatedly. There is:
> 1) send function: datum -> per datum stringinfo
> 2) printtup: per datum stringinfo -> per row stringinfo
> 3) socket_putmessage: per row stringinfo -> PqSendBuffer
> 4) send(): PqSendBuffer -> kernel buffer
>
> It's obviously hard to avoid 1) and 4) in the common case, but the
> number of other copies seem pretty clearly excessive.

I too have seen recent benchmarking data where this was a big problem.
Basically, you need a workload where the server doesn't have much or
any actual query processing to do, but is just returning a lot of
stuff to a really fast client - e.g. a locally connected client.
That's not necessarily the most common case but, if you have it, all
this extra copying is really pretty expensive.

My first thought was to wonder about changing all of our send/output
functions to write into a buffer passed as an argument rather than
returning something which we then have to copy into a different
buffer, but that would be a somewhat painful change, so it is probably
better to first pursue the idea of getting rid of some of the other
copies that happen in more centralized places (e.g. printtup). I
wonder if we could replace the whole
pq_beginmessage...()/pq_send....()/pq_endmessage...() system with
something a bit better-designed. For instance, suppose we get rid of
the idea that the caller supplies the buffer, and we move the
responsibility for error recovery into the pqcomm layer. So you do
something like:

my_message = xyz_beginmessage('D');
xyz_sendint32(my_message, 42);
xyz_endmessage(my_message);

Maybe what happens here under the hood is we keep a pool of free
message buffers sitting around, and you just grab one and put your
data into it. When you end the message we add it to a list of used
message buffers that are waiting to be sent, and once we send the data
it goes back on the free list. If an error occurs after
xyz_beginmessage() and before xyz_endmessage(), we put the buffer back
on the free list. That would allow us to merge (2) and (3) into a
single copy. To go further, we could allow send/output functions to
opt in to receiving a message buffer rather than returning a value,
and then we could get rid of (1) for types that participate. (4) seems
unavoidable AFAIK.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Euler Taveira
Date:
Subject: Re: More tests with USING INDEX replident and dropped indexes
Next
From: Pavel Stehule
Date:
Subject: significant slowdown of HashAggregate between 9.6 and 10