Home > mailing lists

Re: Why is pq_begintypsend so slow? - Mailing list pgsql-hackers

From	Robert Haas
Subject	Re: Why is pq_begintypsend so slow?
Date	June 3, 2020 15:30:42
Msg-id	CA+TgmoaQ4ga3cQLsKEmReHFHxBeH8_+rfgGXhA5speccKzJ-0g@mail.gmail.com Whole thread Raw
In response to	Re: Why is pq_begintypsend so slow? (Andres Freund <andres@anarazel.de>)
Responses	Re: Why is pq_begintypsend so slow?
List	pgsql-hackers

Tree view

On Tue, Jun 2, 2020 at 9:56 PM Andres Freund <andres@anarazel.de> wrote:
> The biggest problem after that is that we waste a lot of time memcpying
> stuff around repeatedly. There is:
> 1) send function: datum -> per datum stringinfo
> 2) printtup: per datum stringinfo -> per row stringinfo
> 3) socket_putmessage: per row stringinfo -> PqSendBuffer
> 4) send(): PqSendBuffer -> kernel buffer
>
> It's obviously hard to avoid 1) and 4) in the common case, but the
> number of other copies seem pretty clearly excessive.

I too have seen recent benchmarking data where this was a big problem.
Basically, you need a workload where the server doesn't have much or
any actual query processing to do, but is just returning a lot of
stuff to a really fast client - e.g. a locally connected client.
That's not necessarily the most common case but, if you have it, all
this extra copying is really pretty expensive.

My first thought was to wonder about changing all of our send/output
functions to write into a buffer passed as an argument rather than
returning something which we then have to copy into a different
buffer, but that would be a somewhat painful change, so it is probably
better to first pursue the idea of getting rid of some of the other
copies that happen in more centralized places (e.g. printtup). I
wonder if we could replace the whole
pq_beginmessage...()/pq_send....()/pq_endmessage...() system with
something a bit better-designed. For instance, suppose we get rid of
the idea that the caller supplies the buffer, and we move the
responsibility for error recovery into the pqcomm layer. So you do
something like:

my_message = xyz_beginmessage('D');
xyz_sendint32(my_message, 42);
xyz_endmessage(my_message);

Maybe what happens here under the hood is we keep a pool of free
message buffers sitting around, and you just grab one and put your
data into it. When you end the message we add it to a list of used
message buffers that are waiting to be sent, and once we send the data
it goes back on the free list. If an error occurs after
xyz_beginmessage() and before xyz_endmessage(), we put the buffer back
on the free list. That would allow us to merge (2) and (3) into a
single copy. To go further, we could allow send/output functions to
opt in to receiving a message buffer rather than returning a value,
and then we could get rid of (1) for types that participate. (4) seems
unavoidable AFAIK.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: Euler Taveira
Date: 03 June 2020, 15:08:56
Subject: Re: More tests with USING INDEX replident and dropped indexes

From: Pavel Stehule
Date: 03 June 2020, 15:32:47
Subject: significant slowdown of HashAggregate between 9.6 and 10

Re: Why is pq_begintypsend so slow? - Mailing list pgsql-hackers

Previous

Next