Thread: libpq: PQgetCopyData() and allocation overhead

libpq: PQgetCopyData() and allocation overhead

From
Jeroen Vermeulen
Date:
Would there be interest in a variant of PQgetCopyData() that re-uses the same buffer for a new row, rather than allocating a new buffer on each iteration?

I tried it on a toy benchmark, and it reduced client-side CPU time by about 12%.  (Less of a difference in wall-clock time of course; the client was only using the CPU for a bit over half the time.)


Jeroen

Re: libpq: PQgetCopyData() and allocation overhead

From
Bharath Rupireddy
Date:
On Fri, Feb 10, 2023 at 3:43 PM Jeroen Vermeulen <jtvjtv@gmail.com> wrote:
>
> Would there be interest in a variant of PQgetCopyData() that re-uses the same buffer for a new row, rather than
allocatinga new buffer on each iteration?
 
>
> I tried it on a toy benchmark, and it reduced client-side CPU time by about 12%.  (Less of a difference in wall-clock
timeof course; the client was only using the CPU for a bit over half the time.)
 

Interesting. It might improve logical replication performance too as
it uses COPY protocol.

Do you mind sharing a patch, test case that you used and steps to
verify the benefit?

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: libpq: PQgetCopyData() and allocation overhead

From
Jeroen Vermeulen
Date:
Here's the patch (as a PR just to make it easy to read): https://github.com/jtv/postgres/pull/1

I don't have an easily readable benchmark yet, since I've been timing the potential impact on libpqxx.  But can do that next.


Jeroen

On Fri, Feb 10, 2023, 11:26 Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
On Fri, Feb 10, 2023 at 3:43 PM Jeroen Vermeulen <jtvjtv@gmail.com> wrote:
>
> Would there be interest in a variant of PQgetCopyData() that re-uses the same buffer for a new row, rather than allocating a new buffer on each iteration?
>
> I tried it on a toy benchmark, and it reduced client-side CPU time by about 12%.  (Less of a difference in wall-clock time of course; the client was only using the CPU for a bit over half the time.)

Interesting. It might improve logical replication performance too as
it uses COPY protocol.

Do you mind sharing a patch, test case that you used and steps to
verify the benefit?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: libpq: PQgetCopyData() and allocation overhead

From
Jeroen Vermeulen
Date:
OK, I've updated the PR with a benchmark (in the main directory).

On this benchmark I'm seeing about a 24% reduction in "user" CPU time, and a 8% reduction in "system" CPU time.  (Almost no reduction in wall-clock time.)


Jeroen

On Fri, 10 Feb 2023 at 11:32, Jeroen Vermeulen <jtvjtv@gmail.com> wrote:
Here's the patch (as a PR just to make it easy to read): https://github.com/jtv/postgres/pull/1

I don't have an easily readable benchmark yet, since I've been timing the potential impact on libpqxx.  But can do that next.


Jeroen

On Fri, Feb 10, 2023, 11:26 Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
On Fri, Feb 10, 2023 at 3:43 PM Jeroen Vermeulen <jtvjtv@gmail.com> wrote:
>
> Would there be interest in a variant of PQgetCopyData() that re-uses the same buffer for a new row, rather than allocating a new buffer on each iteration?
>
> I tried it on a toy benchmark, and it reduced client-side CPU time by about 12%.  (Less of a difference in wall-clock time of course; the client was only using the CPU for a bit over half the time.)

Interesting. It might improve logical replication performance too as
it uses COPY protocol.

Do you mind sharing a patch, test case that you used and steps to
verify the benefit?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: libpq: PQgetCopyData() and allocation overhead

From
Bharath Rupireddy
Date:
On Fri, Feb 10, 2023 at 5:49 PM Jeroen Vermeulen <jtvjtv@gmail.com> wrote:
>
> OK, I've updated the PR with a benchmark (in the main directory).
>
> On this benchmark I'm seeing about a 24% reduction in "user" CPU time, and a 8% reduction in "system" CPU time.
(Almostno reduction in wall-clock time.) 

I can help run some logical replication performance benchmarks
tomorrow. Would you mind cleaning the PR and providing the changes
(there are multiple commits in the PR) as a single patch here for the
sake of ease of review and test?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: libpq: PQgetCopyData() and allocation overhead

From
Jelte Fennema
Date:
Instead of implementing new growable buffer logic in this patch. It
seems much nicer to reuse the already existing PQExpBuffer type for
this patch.


On Mon, 27 Feb 2023 at 14:48, Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Fri, Feb 10, 2023 at 5:49 PM Jeroen Vermeulen <jtvjtv@gmail.com> wrote:
> >
> > OK, I've updated the PR with a benchmark (in the main directory).
> >
> > On this benchmark I'm seeing about a 24% reduction in "user" CPU time, and a 8% reduction in "system" CPU time.
(Almostno reduction in wall-clock time.) 
>
> I can help run some logical replication performance benchmarks
> tomorrow. Would you mind cleaning the PR and providing the changes
> (there are multiple commits in the PR) as a single patch here for the
> sake of ease of review and test?
>
> --
> Bharath Rupireddy
> PostgreSQL Contributors Team
> RDS Open Source Databases
> Amazon Web Services: https://aws.amazon.com
>
>



Re: libpq: PQgetCopyData() and allocation overhead

From
Jeroen Vermeulen
Date:
Done.  Thanks for looking!

Jelte Fennema pointed out that I should probably be using PQExpBuffer for this.  I'll look into that later; this is a proof of concept, not a production-ready API proposal.


Jeroen

On Mon, 27 Feb 2023 at 14:48, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
On Fri, Feb 10, 2023 at 5:49 PM Jeroen Vermeulen <jtvjtv@gmail.com> wrote:
>
> OK, I've updated the PR with a benchmark (in the main directory).
>
> On this benchmark I'm seeing about a 24% reduction in "user" CPU time, and a 8% reduction in "system" CPU time.  (Almost no reduction in wall-clock time.)

I can help run some logical replication performance benchmarks
tomorrow. Would you mind cleaning the PR and providing the changes
(there are multiple commits in the PR) as a single patch here for the
sake of ease of review and test?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: libpq: PQgetCopyData() and allocation overhead

From
Jeroen Vermeulen
Date:
Update: in the latest iteration, I have an alternative that reduces CPU time by more than half, compared to PQgetCopyData().

And the code is simpler, too, both in the client and in libpq itself.  The one downside is that it breaks with libpq's existing API style.



Jeroen

On Mon, 27 Feb 2023 at 17:08, Jeroen Vermeulen <jtvjtv@gmail.com> wrote:
Done.  Thanks for looking!

Jelte Fennema pointed out that I should probably be using PQExpBuffer for this.  I'll look into that later; this is a proof of concept, not a production-ready API proposal.


Jeroen

On Mon, 27 Feb 2023 at 14:48, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
On Fri, Feb 10, 2023 at 5:49 PM Jeroen Vermeulen <jtvjtv@gmail.com> wrote:
>
> OK, I've updated the PR with a benchmark (in the main directory).
>
> On this benchmark I'm seeing about a 24% reduction in "user" CPU time, and a 8% reduction in "system" CPU time.  (Almost no reduction in wall-clock time.)

I can help run some logical replication performance benchmarks
tomorrow. Would you mind cleaning the PR and providing the changes
(there are multiple commits in the PR) as a single patch here for the
sake of ease of review and test?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: libpq: PQgetCopyData() and allocation overhead

From
Daniel Gustafsson
Date:
> On 1 Mar 2023, at 15:23, Jeroen Vermeulen <jtvjtv@gmail.com> wrote:

> PR for easy discussion: https://github.com/jtv/postgres/pull/1

The process for discussing work on pgsql-hackers is to attach the patch to the
email and discuss it inline in the thread.  That way all versions of the patch
as well as the discussion is archived and searchable.

--
Daniel Gustafsson




Re: libpq: PQgetCopyData() and allocation overhead

From
Jeroen Vermeulen
Date:
My apologies.  The wiki said to discuss early, even before writing the code if possible, but I added a link to the PR for those who really wanted to see the details.

I'm attaching a diff now.  This is not a patch, it's just a discussion piece.

The problem was that PQgetCopyData loops use a lot of CPU time, and this alternative reduces that by a lot.


Jeroen

On Thu, 2 Mar 2023 at 13:38, Daniel Gustafsson <daniel@yesql.se> wrote:
> On 1 Mar 2023, at 15:23, Jeroen Vermeulen <jtvjtv@gmail.com> wrote:

> PR for easy discussion: https://github.com/jtv/postgres/pull/1

The process for discussing work on pgsql-hackers is to attach the patch to the
email and discuss it inline in the thread.  That way all versions of the patch
as well as the discussion is archived and searchable.

--
Daniel Gustafsson

Attachment

Re: libpq: PQgetCopyData() and allocation overhead

From
Jelte Fennema
Date:
On Thu, 2 Mar 2023 at 20:45, Jeroen Vermeulen <jtvjtv@gmail.com> wrote:
> I'm attaching a diff now.  This is not a patch, it's just a discussion piece.

Did you try with PQExpBuffer? I still think that probably fits better
in the API design of libpq. Obviously if it's significantly slower
than the callback approach in this patch then it's worth considering
using the callback approach. Overall it definitely seems reasonable to
me to have an API that doesn't do an allocation per row.



Re: libpq: PQgetCopyData() and allocation overhead

From
Jeroen Vermeulen
Date:
Thank you.

I meant to try PQExpBuffer — as you say, it fits much better with existing libpq style.  But then I hit on the callback idea which was even more efficient, by a fair margin.  It was also considerably simpler both inside libpq and in the client code, eliminating all sorts of awkward questions about who deallocates the buffer in which situations.  So I ditched the "dynamic buffer" concept and went with the callback.

One other possible alternative suggests itself: instead of taking a callback and a context pointer, the function could probably just return a struct: status/size, and buffer.  Then the caller would have to figure out whether there's a line in the buffer, and if so, process it.  It seems like more work for the client code, but it may make the compiler's optimisation work easier.


Jeroen

On Fri, 3 Mar 2023 at 16:52, Jelte Fennema <postgres@jeltef.nl> wrote:
On Thu, 2 Mar 2023 at 20:45, Jeroen Vermeulen <jtvjtv@gmail.com> wrote:
> I'm attaching a diff now.  This is not a patch, it's just a discussion piece.

Did you try with PQExpBuffer? I still think that probably fits better
in the API design of libpq. Obviously if it's significantly slower
than the callback approach in this patch then it's worth considering
using the callback approach. Overall it definitely seems reasonable to
me to have an API that doesn't do an allocation per row.

Re: libpq: PQgetCopyData() and allocation overhead

From
Tom Lane
Date:
Jelte Fennema <postgres@jeltef.nl> writes:
> Did you try with PQExpBuffer? I still think that probably fits better
> in the API design of libpq.

If you mean exposing PQExpBuffer to users of libpq-fe.h, I'd be very
seriously against that.  I realize that libpq exposes it at an ABI
level, but that doesn't mean we want non-Postgres code to use it.
I also don't see what it'd add for this particular use-case.

One thing I don't care for at all in the proposed API spec is the bit
about how the handler function can scribble on the passed buffer.
Let's not do that.  Declare it const char *, or maybe better const void *.

Rather than duplicating most of pqGetCopyData3, I'd suggest revising
it to take a callback, where the callback is either user-supplied
or is supplied by PQgetCopyData to emulate the existing behavior.
This would both avoid duplicate coding and provide a simple check that
you've made a usable callback API (in particular, one that can do
something sane for error cases).

            regards, tom lane



Re: libpq: PQgetCopyData() and allocation overhead

From
Jeroen Vermeulen
Date:
On Fri, 3 Mar 2023 at 17:33, Tom Lane <tgl@sss.pgh.pa.us> wrote:

If you mean exposing PQExpBuffer to users of libpq-fe.h, I'd be very
seriously against that.  I realize that libpq exposes it at an ABI
level, but that doesn't mean we want non-Postgres code to use it.
I also don't see what it'd add for this particular use-case.

Fair enough.  Never even got around to checking whether it was in the API already.

 
One thing I don't care for at all in the proposed API spec is the bit
about how the handler function can scribble on the passed buffer.
Let's not do that.  Declare it const char *, or maybe better const void *.
 
Personally I would much prefer "char" over "void" here:
* It really is a char buffer, containing text.
* If there is to be any type punning, best have it explicit.
* Reduces risk of getting the two pointer arguments the wrong way around.

As for const, I would definitely have preferred that.  But if the caller needs a zero-terminated string, forcing them into a memcpy() would kind of defeat the purpose.

I even tried poking a terminating zero in there from inside the function, but that made the code significantly less efficient.  Optimiser assumptions, I suppose.


Rather than duplicating most of pqGetCopyData3, I'd suggest revising
it to take a callback, where the callback is either user-supplied
or is supplied by PQgetCopyData to emulate the existing behavior.
This would both avoid duplicate coding and provide a simple check that
you've made a usable callback API (in particular, one that can do
something sane for error cases).

Can do that, sure.  I'll also try benchmarking a variant that doesn't take a callback at all, but gives you the buffer pointer in addition to the size/status return.  I don't generally like callbacks.


Jeroen

Re: libpq: PQgetCopyData() and allocation overhead

From
Tom Lane
Date:
Jeroen Vermeulen <jtvjtv@gmail.com> writes:
> On Fri, 3 Mar 2023 at 17:33, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Let's not do that.  Declare it const char *, or maybe better const void *.

> Personally I would much prefer "char" over "void" here:
> * It really is a char buffer, containing text.

Not in binary-mode COPY.

> As for const, I would definitely have preferred that.  But if the caller
> needs a zero-terminated string, forcing them into a memcpy() would kind of
> defeat the purpose.

I'm willing to grant that avoiding malloc-and-free is worth the trouble.
I'm not willing to allow applications to scribble on libpq buffers to
avoid memcpy.  Even your not-a-patch patch fails to make the case that
this is essential, because you could have used fwrite() instead of
printf() (which would be significantly faster yet btw, printf formatting
ain't cheap).

> Can do that, sure.  I'll also try benchmarking a variant that doesn't take
> a callback at all, but gives you the buffer pointer in addition to the
> size/status return.  I don't generally like callbacks.

Um ... that would require an assumption that libpq neither changes nor
moves that buffer before returning to the caller.  I don't much like
that either.

            regards, tom lane



Re: libpq: PQgetCopyData() and allocation overhead

From
Jelte Fennema
Date:
On Fri, 3 Mar 2023 at 17:33, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> If you mean exposing PQExpBuffer to users of libpq-fe.h, I'd be very
> seriously against that.  I realize that libpq exposes it at an ABI
> level, but that doesn't mean we want non-Postgres code to use it.

The code comment in the pqexpbuffer.h header suggests that client
applications are fine too use the API to:

> * This module is essentially the same as the backend's StringInfo data type,
> * but it is intended for use in frontend libpq and client applications.

I know both pg_auto_failover and pgcopydb use it quite a lot.



Re: libpq: PQgetCopyData() and allocation overhead

From
Tom Lane
Date:
Jelte Fennema <postgres@jeltef.nl> writes:
> On Fri, 3 Mar 2023 at 17:33, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> If you mean exposing PQExpBuffer to users of libpq-fe.h, I'd be very
>> seriously against that.  I realize that libpq exposes it at an ABI
>> level, but that doesn't mean we want non-Postgres code to use it.

> The code comment in the pqexpbuffer.h header suggests that client
> applications are fine too use the API to:

Our own client apps, sure.  But you have to buy into the whole Postgres
compilation environment to use PQExpBuffer.  (If you don't believe me,
just try including pqexpbuffer.h by itself.)  That's a non-starter
for most clients.

            regards, tom lane



Re: libpq: PQgetCopyData() and allocation overhead

From
Jeroen Vermeulen
Date:
On Fri, 3 Mar 2023 at 18:14, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Jeroen Vermeulen <jtvjtv@gmail.com> writes:
> On Fri, 3 Mar 2023 at 17:33, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Let's not do that.  Declare it const char *, or maybe better const void *.

> Personally I would much prefer "char" over "void" here:
> * It really is a char buffer, containing text.

Not in binary-mode COPY.

True.  And in that case zero-termination doesn't matter much either.  But overall libpq's existing choice seems reasonable.


> As for const, I would definitely have preferred that.  But if the caller
> needs a zero-terminated string, forcing them into a memcpy() would kind of
> defeat the purpose.

I'm willing to grant that avoiding malloc-and-free is worth the trouble.
I'm not willing to allow applications to scribble on libpq buffers to
avoid memcpy.  Even your not-a-patch patch fails to make the case that
this is essential, because you could have used fwrite() instead of
printf() (which would be significantly faster yet btw, printf formatting
ain't cheap).
 
Your house, your rules.  For my own use-case "const" is just peachy.

The printf() is just the simplest example that sprang to mind though.  There may be other use-cases out there involving  libraries that require zero-terminated strings, and I figured an ability to set a sentinel could help those.


> Can do that, sure.  I'll also try benchmarking a variant that doesn't take
> a callback at all, but gives you the buffer pointer in addition to the
> size/status return.  I don't generally like callbacks.

Um ... that would require an assumption that libpq neither changes nor
moves that buffer before returning to the caller.  I don't much like
that either.
 
Not an assumption about _before returning to the caller_ I guess, because the function would be on top of that anyway.  The concern would be libpq changing or moving the buffer _before the caller is done with the line._  Which would require some kind of clear rule about what invalidates the buffer.  Yes, that is easier with the callback.


Jeroen

Re: libpq: PQgetCopyData() and allocation overhead

From
Tom Lane
Date:
Jeroen Vermeulen <jtvjtv@gmail.com> writes:
> The printf() is just the simplest example that sprang to mind though.
> There may be other use-cases out there involving  libraries that require
> zero-terminated strings, and I figured an ability to set a sentinel could
> help those.

Well, since it won't help for binary COPY, I'm skeptical that this is
something we should cater to.  Anybody who's sufficiently worried about
performance to be trying to remove malloc/free cycles ought to be
interested in binary COPY as well.

            regards, tom lane



Re: libpq: PQgetCopyData() and allocation overhead

From
Jeroen Vermeulen
Date:
Interested, yes.  But there may be reasons not to do that.  Last time I looked the binary format wasn't documented.

Anyway, I tried re-implementing pqGetCopyData3() using the callback.  Wasn't hard, but I did have to add a way for the callback to return an error.  Kept it pretty dumb for now, hoping a sensible rule will become obvious later.

Saw no obvious performance impact, so that's good.


Jeroen

On Fri, 3 Mar 2023 at 19:53, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Jeroen Vermeulen <jtvjtv@gmail.com> writes:
> The printf() is just the simplest example that sprang to mind though.
> There may be other use-cases out there involving  libraries that require
> zero-terminated strings, and I figured an ability to set a sentinel could
> help those.

Well, since it won't help for binary COPY, I'm skeptical that this is
something we should cater to.  Anybody who's sufficiently worried about
performance to be trying to remove malloc/free cycles ought to be
interested in binary COPY as well.

                        regards, tom lane
Attachment